BookNook is a comprehensive small-group tutoring intervention program combining a synchronous online learning platform and virtual tutoring. Since 2016, our unwavering mission has been to ensure every student has an equal chance to excel through our research-backed instructional materials and consistent tutor relationships. Committed to continuous growth, we navigate the learning framework with innovation, equity, and boundless opportunities for each student's pathway to success.
We are seeking a skilled Site Reliability Engineer (SRE) to join our team. As an SRE, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications. You will collaborate closely with development teams to design and implement robust infrastructure solutions, automate deployment processes, and maintain high availability across our cloud-based platform. The ideal candidate will have extensive experience in AWS, Terraform or AWS CloudFormation, observability tools, zero downtime deployments, and deployment pipelines.
-
Infrastructure Automation:
Design, implement, and manage infrastructure as code using Terraform or AWS CloudFormation to ensure scalability, reliability, and cost-effectiveness.
-
Deployment Pipelines:
Develop and maintain automated deployment pipelines to facilitate seamless and efficient delivery of software updates while minimizing downtime.
-
Zero Downtime Deployments:
Implement strategies and tools to enable zero downtime deployments, including blue-green deployments, canary releases, and feature toggles.
-
Monitoring and Observability:
Establish comprehensive monitoring and observability solutions using tools like Prometheus, Grafana, and AWS CloudWatch to proactively identify and address performance bottlenecks, errors, and other issues.
-
Incident Response:
Respond to and resolve incidents promptly to minimize service disruptions and ensure high availability.
-
Capacity Planning:
Conduct capacity planning exercises to anticipate and accommodate future growth and ensure optimal resource utilization.
-
Security and Compliance:
Implement security best practices and ensure compliance with relevant regulations and standards, such as FERPA, CCPA, and SOC 2.
-
Collaboration:
Work closely with cross-functional teams, including developers, DevOps engineers, and QA engineers, to drive continuous improvement and innovation.
-
Bachelor's degree in Computer Science, Engineering, or related field.
-
Proven experience as a Site Reliability Engineer or similar role.
-
Strong proficiency in AWS services, including EC2, S3, Lambda, Elastic Beanstalk, and RDS.
-
Extensive experience with Terraform or similar infrastructure as code tools.
-
In-depth knowledge of observability tools and techniques for monitoring and troubleshooting distributed systems.
-
Hands-on experience with zero downtime deployment strategies and deployment pipelines using tools like Jenkins, CircleCI, or Github Actions.
-
Solid understanding of networking concepts, security best practices, and compliance requirements.
Perks and Benefits
-
Competitive salary:
For this role, the salary range is $120-$135K Exact compensation may vary based on skills and experience.
-
Work remotely
: Live and work wherever you like in the continental US.
-
Health insurance
: We offer medical, dental, vision and pet insurance for all our team members.
-
Time to recharge
: We offer flexible PTO, 11 paid holidays, and one company-wide week off.
-
401(k)
: With 3% company match.
-
Home office setup
: Get a laptop + $130 monthly stipend for home expenses.
-
Employee Recognition:
We have built a culture based upon recognizing our coworkers' value based achievements.
Mindfulness
I
Stewardship
I
Curiosity
I
Growth
I
Achievement