Sr. Site Reliability Engineer

Illuminate Education - Remote - Full time

Position Overview

As a Sr. Site Reliability Engineer you will be working as part of a centralized team, which is in the early phases of building “paved roads” for development teams and driving the DevOps transformation within the organization. Using commercial, open source, and home grown tooling, you will be building an automation platform and toolset to enable development teams to rapidly develop and deploy software and provide value to customers. You will also be expected to consult with development teams on improving their architecture, help them consume the services we are building, and empower them to support their own services in production.

To accomplish this goal, you will be responsible for designing and developing automation including the provisioning, configuration, operation, monitoring and maintenance of systems primarily within the Amazon Web Services (AWS) cloud infrastructure. The toolset in which you and your team are responsible for providing include CI, CD, automated pipelines, security tools, network topology, policy management, monitoring and telemetry, container orchestration/scheduling, and machine image creation and configuration.

In this role we are currently looking for engineers with several years of operations experience including network provisioning, systems administration, machine/container configuration and security, and identity management. You will be working on a team with other engineers whose experience ranges from operations to development to database administration. The opportunity will include being a part building a generative culture of continuous learning, continuous delivery, and continuous improvement.

Key Responsibilities

  • Design and build security measures into automation, including securing network topology, securing machine images, container scanning, automated pen testing tools, etc.
  • Develop, deploy, and maintain tooling to be consumed by development teams including container orchestration, secrets management, service discovery, infrastructure provisioning, image creation, server configuration, identity management, monitoring and telemetry and others.
  • Train, advise, and assist development teams on DevOps best practices.
  • Manage, diagnose, and resolve system incidents and internal technical escalations within the AWS infrastructure.
  • Identify and implement appropriate use of AWS operational best practices.
  • Engage with Software Engineers, Product Managers, and Quality Assurance Analysts to diagnose problems and help architect long term solutions.
  • Collaborate with Product Managers and Engineers to identify, prioritize, and develop enhancements to the platform.
  • Collaborate with development teams to automate their value streams, and empower them to own them.
  • Contribute to developing security policies and automated tooling for policy enforcement.
  • Perform daily system monitoring to verify the integrity and availability of the services we provide.
  • Perform ongoing performance tuning, hardware upgrades, and resource optimization.
  • Provide emergency on-call support on a rotating schedule.

Desired Experience & Qualifications

  • 5+ years of systems administration experience including securing, troubleshooting, administering, and monitoring Linux systems.
  • Proficiency with AWS technologies including VPC/Networking, EC2, RDS, S3, CloudFormation, and CloudFront.
  • Security expertise and certifications including CCSP, CISSP, or similar.
  • Strong Terraform experience.
  • Experience with container orchestration with either Kubernetes or Nomad.
  • Understanding of microservice architecture and communication patterns.
  • Solid knowledge with networking, storage, load balancing, and virtualization.
  • Proficiency with Nginx and Apache servers or similar.
  • Knowledge of Windows Server 2012 R2, 2016.
  • BS degree in an IT discipline.
  • Experience developing, implementing, and continually improving system and network monitoring and alerting capabilities and procedures.
  • AWS or Microsoft Cloud Certification.
  • Previous experience as a NOC Engineer monitoring and troubleshooting servers.

Apply for this job

Apply for this job



Experience Level

Senior Level

Illuminate Education

Illuminate Education partners with K-12 educators to equip them with data to serve the whole child and reach new levels of student performance.

Share this job

Find your

Dream job in edtech

EdSurge Connect Beta matches talented educators, technologists, and business leaders with amazing edtech companies.

Get Matched

Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up