Senior Site Reliability Engineer

DreamBox Learning - Remote - Full time

About Us:

Right now, even though STEM skills are increasingly important, over 60% of students in Kindergarten through 8th grade are not proficient in math at their own grade-level. The pandemic has increased the opportunity gap for our most vulnerable students. As a society, we need to bring together our best, most creative minds to tackle this critical problem and ensure all kids are successful in math and school and have the tools they need to reach their potential. This includes developing the most innovative learning technology using advanced data science in a way that inspires students and empowers teachers. Come help us make a difference at DreamBox Learning.We’re passionate about our mission to radically transform the way the world learns. Today, our Intelligent Adaptive Learning platform – with its rigorous math curriculum and game-based environment – is helping 5 million kids and over 150,000 educators improve math achievement and build a love of math at the same time. In the wake of COVID-19, and the broad range of learning experiences: in-person, all virtual and hybrid models, we are uniquely positioned to bring our best-in-class adaptive learning platform to more students and provide Educators and Parents with insights into their student’s learning.We offer a flexible, hybrid work solution once it is safe to go back into the office. We are all 100% remote at this time.

About the Role:

DreamBox is currently seeking a Senior Site Reliability Engineer in the SRE Team within the Technology Department. SREs at DreamBox take direct ownership of the platform, tooling, and automation that enables reliable deployment and reliable execution of all of DreamBox’s backend and frontend services. They’re not simply responsible for the operation of these foundational components, but for the full SDLC, from initial idea to eventual retirement. An ideal candidate for this job hates toil and loves to build scalable, robust software that automates it away.The ideal candidate is also customer-obsessed. This means that you think about system availability at every level, from every angle; that you live and breathe SLOs; that you consult with your internal customers how you can enable them to build, test, deploy, monitor, and operate their services in increasingly frictionless ways, with higher quality. You understand the leverage inherent in treating operations and processes as software problems and understand how to communicate the value of that work.If this sounds like you, we need you to help us continue the massive gains we’ve already achieved. Come join us!

What You'll Be Doing:

  • Work and collaborate with a team of engineers with diverse skill sets
  • Take part in the architecture and implementation of our container orchestration platform, CI/CD pipeline, and other internal tools
  • Automate manual processes through software, enabling other engineering teams to completely own all aspects of the SDLC of their application services
  • Research, evaluate, and work with cutting-edge technologies that are defining the future of the cloud
  • Collaborate with other engineers in the organization to foster solid engineering principles, and drive improvement
  • Help us shape a DevOps culture, and foster its adoption
  • Participate in On-Call rotation
  • Ensure that DreamBox production systems meet or exceed all SLAs
  • Ensure that DreamBox production systems can maintain SLA availability measures for the foreseeable future, allowing for planned or even likely change and growth
  • Work with other teams to ensure that non-production systems also meet or exceed SLAs. This will entail technical work (like implementing redundancy or improving test automation) as well as non-technical work like training, listening, and coaching
  • Help systems owners and multiple levels of management to understand the costs of availability decisions, in terms of dollars but also in flexibility, transparency, and management overhead
  • Work is completed in a timely manner, and in accordance with Dreambox best practices and team working agreements
  • Meeting and exceeding personal goals, that demonstrably support team key results, as established with direct manager

About You:

  • 3 years or more experience writing software (preferably with Python and Go) and scripting (Bash)
  • 3 years or more experience supporting software at scale in a high-availability cloud environment, preferably with AWS
  • Relevant work experience with containers and container orchestration
  • Relevant work experience with the HashiCorp suite of tools (Nomad, Consul, Vault, Terraform, Packer)
  • Relevant work experience with Build/Test/Deployment Automation
  • Outstanding interpersonal and communication skills
  • Robust problem-solving skills
  • Significant experience with a variety of monitoring tools and technologies, such as DataDog, CloudWatch, NewRelic, PagerDuty, ELK
  • Able to clearly articulate definition of and proper handling for PII
  • Advanced competency in source management in a Git environment, including branching, merging, and version difference comparisons
  • Advanced understanding of container orchestration, and containers in general. Able to discuss strengths and weaknesses of various container technologies
  • Proficiency with AWS Cloud configuration, particularly networking and security components
  • Proficient with Terraform
  • Proficiency in at least one of these development languages: Python, Go
  • Broad knowledge of Linux administration
  • Working knowledge of TCP/IP addressing and routing

At DreamBox, we are hooked on celebrating diversity & providing an inclusive workplace and it shows throughout our product, brand, and teams. We are proud to be an equal opportunity employer. Thanks for considering DreamBox Learning!

Apply for this job

Apply for this job



Experience Level

Senior Level

Share this job
Get our email newsletterSign me up
Keep up to date with our email newsletterSign me up