Director, Site Reliability Engineering - Irvine, CA or Minneapolis, MN

Illuminate Education - Multiple Locations - Full time

Position Overview

Illuminate Education is seeking a Director, Site Reliability Engineering to oversee our Site Reliability Engineering (SRE) team. As a member of the Engineering leadership team, you will help design and operate the next generation of Illuminate’s cloud architecture. You will lead the SRE team in order to keep Illuminate’s sites highly available, and to triage and resolve the critical infrastructure and application issues. You will work closely with the software engineering teams building applications and infrastructure and will focus on driving scalability, stability, reliability, operability of services, and security. The successful candidate will understand the demands of managing 24x7 applications at large scale. This is a high-visibility role that will greatly impact the quality of our service and ability to build highly scalable and efficient cloud applications, which serve millions of students.

Key Responsibilities

  • Design, operate, and improve our most critical services.
  • Work across teams to understand system requirements, evaluate trade-offs, and deliver the solutions needed to build reliable services. This individual does not need to have the “correct” answer to everything, but rather, should be able to drive the conversation toward a productive solution by including the right stakeholders, weighing pros/cons, business needs, timelines, etc.
  • Participate in operations along with engineering team on-calls, helping to debug, improve, and optimize critical backend services.
  • Manage Illuminate’s 24x7, always-available applications and infrastructure to meet the high traffic needs, and strive to eliminate downtime, improve reliability, and improve the manageability of its services.
  • Build and manage a world-class team of managers and engineers capable of scaling with Illuminate through a period of continued high-growth.
  • Attract top tier talent to match this level of growth.
  • Measure and improve efficiency and effectiveness of processes that are working well and build the next level of improvements; set standards for deployments at scale, application and infrastructure reliability and scalability.
  • Continue to improve on an engineering culture across all tech functions; build and lead an organization with customer focus, world-class quality, effective communication, decisive, fast moving solutions, quick and constructive resolutions of conflicts, and a “no barriers” mentality.
  • Serve as an evangelist for the team and overall culture, both internally and externally.
  • Identify scaling bottlenecks and help Illuminate services scale to meet user demand.
  • Perform design, code, and process reviews to improve individual and enterprise systems.
  • Help make our team better by contributing to design and launch reviews for new services.
  • Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and efficiency.

Desired Experience & Qualifications

  • Bachelor’s degree in a technical field such as computer science or equivalent experience.
  • 10+ years experience in production/site reliability engineering while leading a team.
  • Experience building teams and grooming/mentoring team members.
  • Innovative, intellectually curious, and strong problem solver who stays current on modern solutions and best practices.
  • Proven track record of successfully building and managing rapidly growing, world-class SRE organizations in large-scale environments.
  • Demonstrated ability to deliver in a hyper-growth environment and be familiar with challenges inherent to this type of growth.
  • Direct experience building (preferably large scale) highly available services with a focus on scalability and reliability.
  • History of being a hands-on leader, either in a startup or growth stage company.
  • Ability to question the status quo and identify opportunities for innovation and change.
  • Experience or proficiency in [Java, PHP, Go].
  • Experience with backend services, distributed systems, or Linux internals.
  • Interest in operational excellence, availability, and automating away manual tasks.
  • Passionate about problem-solving with strong technical communication skills and desire to collaborate with others.
  • Experience operating large-scale distributed systems, microservice architectures, or multi-tenant systems.
  • Hands-on experience using AWS or Google Cloud services.
  • Experience with NoSQL storage solutions and Memcache/Redis.
  • Experience with Kubernetes, Envoy, and related software a plus.

Apply for this job

Apply for this job


Minneapolis, Minnesota
Irvine, California



Experience Level

Senior Level

Illuminate Education

Illuminate Education partners with K-12 educators to equip them with data to serve the whole child and reach new levels of student performance. Our solution brings together holistic data and collaborative tools and puts them in the hands of educators.

Share this job

Find your

Dream job in edtech

EdSurge Connect Beta matches talented educators, technologists, and business leaders with amazing edtech companies.

Get Matched

Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up