Site Reliability Engineer - FastBridge Learning

Illuminate Education - Minneapolis, Minnesota - Full time

Position Overview

As a Site Reliability Analyst at FastBridge Learning (FBL) you will be responsible for effective provisioning, configuration, operation, monitoring and maintenance of systems within the FBL Amazon Web Services (AWS) cloud infrastructure. You will ensure that virtual machines, operating systems, software systems are scalable, highly available, and meet published Service Level Agreements. You will be responsible for ensuring all systems and related procedures adhere to organizational standards.

Working within the Technical Operations team, you will assist project teams with technical issues in the initiation, planning, and implementation phases of our standard Software Development Lifecycle. You will actively participate in the definition of needs, benefits, and technical approaches. You will also play a key role within the FBL DevOps value stream, using CI/CD tools to effectively deliver software products into the FBL cloud.

Key Responsibilities

  • Manage, diagnose, and resolve system incidents and internal technical escalations within the FBL AWS infrastructure.
  • Plan, lead, and manage Cloud based Linux and Windows system administration tasks within the FBL AWS infrastructure utilizing VPC, IAM, EC2, EBS, EFS, S3, etc.
  • Identify and implement appropriate use of AWS operational best practices.
  • Develop tools and scripts to assist with routine maintenance tasks.
  • Compose and leverage AWS CloudFormation Templates to ensure repeatable, sustainable, effective management of AWS infrastructure.
  • Deploy, test, and document development, pre-production, and production environments.
  • Engage with Software Engineers, Product Managers, and Quality Assurance Analysts to diagnose and resolve technical issues.
  • Collaborate with Product Managers and Engineers to identify, prioritize, and develop enhancements to the platform.
  • Manage FBL service infrastructure stack, monitoring, and performance metrics using Bitbucket, Jenkins, New Relic, and AWS tooling.
  • Collaborate with Agile development teams to ensure smooth promotion of code to production.
  • Contribute to and maintain system standards.
  • Update and author documentation and procedures.
  • Update, maintain, and test Disaster Recovery procedures.
  • Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes.
  • Review system and application logs and verify completion of scheduled jobs.
  • Perform periodic performance reporting to support capacity planning.
  • Perform ongoing performance tuning, hardware upgrades, and resource optimization.
  • Provide emergency on-call support on a rotating schedule.

Desired Experience & Qualifications

  • 3-5 years of related experience.
  • 1 to 3 years systems administration experience including implementing, troubleshooting and administering, and monitoring Windows and Linux systems.
  • Previous experience as a NOC Engineer monitoring and troubleshooting servers.
  • Proficiency with AWS technologies including VPC/Networking, RDS, S3, CloudFormation, and CloudFront.
  • Solid knowledge in Linux/Unix administration and understanding of open-source software.
  • Experience automating operational processes using scripting languages such as Shell/Ruby/Python/Ansible/Puppet/Chef tools.
  • Proficiency with Nginx and Apache servers or similar.
  • Knowledge of Windows Server 2012 R2, 2016.
  • Implementing applications in a cloud platform.
  • Knowledge of Microsoft SQL Server and/or PostgreSQL.
  • BS degree in an IT discipline.
  • Experience developing, implementing, and continually improving system and network monitoring and alerting capabilities and procedures.
  • Experience supporting virtualized environments.
  • Solid knowledge with networking, storage, load balancing, virtualization.
  • AWS or Microsoft Cloud Certification.
  • Working knowledge of New Relic.
  • Working knowledge or CI/CD tools like Jenkins CI, Artifactory, Git and Maven.
  • Java or Python development experience

Apply for this job

Apply for this job

Role

Engineering

Experience Level

Mid Level

Illuminate Education

FastBridge Learning, based in downtown Minneapolis, is on a rapid growth trajectory of success because of our people...true change-makers in transforming the way educators assess and address the learning needs of their students.

Share this job

Find your

Dream job in edtech

EdSurge Connect Beta matches talented educators, technologists, and business leaders with amazing edtech companies.

Get Matched

Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up