Site Reliability Engineer - DnA - remote

Illuminate Education - Remote - Full time

Position Overview

As a Site Reliability Engineer working as part of an engineering team, you will be responsible for designing and developing automation including the provisioning, configuration, operation, monitoring and maintenance of systems within the Amazon Web Services (AWS) and Google Cloud Platform (GCP) cloud infrastructures. The toolset you and your team will be using includes CI, CD, automated pipelines, security tools, network topology, policy management, monitoring and telemetry, container orchestration/scheduling, and machine image creation and configuration.

In this role we are currently looking for engineers with operations experience including network provisioning, systems administration, machine/container configuration and security, and identity management. You will be working on a team with other engineers building new applications and modernizing legacy systems. The opportunity will include being a part of building a generative culture of continuous learning, continuous delivery, and continuous improvement.

Key Responsibilities

  • Train, advise, and assist fellow team members on DevOps best practices.
  • Manage, diagnose, and resolve system incidents and internal technical escalations within the cloud infrastructures.
  • Identify and implement appropriate use of AWS & GCP operational best practices.
  • Engage with Software Engineers, Product Managers, and Quality Assurance Analysts to diagnose problems and help architect long term solutions.
  • Collaborate with Product Managers and Engineers to identify, prioritize, and develop enhancements to the platform.
  • Perform daily system monitoring to verify the integrity and availability of the services we provide.
  • Perform ongoing performance tuning, hardware upgrades, and resource optimization.
  • Build out production-like dev, staging, and test environments.
  • Audit and update dependencies, help modernize development environments, and assist with breaking out a monolith into integrated services.
  • Provide emergency on-call support on a rotating schedule.

Desired Experience & Qualifications

  • Systems administration experience including securing, troubleshooting, administering, and monitoring Linux systems.
  • Proficiency with AWS technologies including VPC/Networking, EC2, RDS, S3, CloudFormation, and CloudFront.
  • Familiarity with GCP technologies a plus.
  • Strong Terraform experience.
  • Experience with container orchestration with Kubernetes, Nomad, Fargate or similar.
  • Understanding of microservice architecture and communication patterns.
  • Solid knowledge with networking, storage, load balancing, and virtualization.
  • Proficiency with Nginx and Apache servers or similar.
  • BS degree in an IT discipline.
  • Experience developing, implementing, and continually improving system and network monitoring and alerting capabilities and procedures.
  • AWS or Microsoft Cloud Certification.
  • Previous experience as a NOC Engineer monitoring and troubleshooting servers.


Apply for this job

Apply for this job



Experience Level

Mid Level

Illuminate Education

Illuminate Education partners with K-12 educators to equip them with data to serve the whole child and reach new levels of student performance. Our solution brings together holistic data and collaborative tools and puts them in the hands of educators.
Share this job
Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up