Job Description
Job Title Site Reliability Engineer
Relevant Experience(in Yrs) 6
Technical/Functional Skills
Power BI; SQL; Cosmos; Digital : Kubernetes; Digital : Microsoft Power BI; Digital : Azure Databricks
Experience Required
5
Roles & Responsibilities
SRE Key Responsibilities:
- Collaborate with cross-functional teams to design, implement, and maintain highly available and scalable production systems.
- Monitor system performance, identify bottlenecks, and proactively take action to prevent downtime and ensure optimal user experience.
- Implement automation for provisioning, deployment, and configuration management to increase efficiency and reduce manual intervention.
- Participate in incident response and post-incident analysis, driving continuous improvement in system reliability and recovery processes.
- Conduct capacity planning and performance testing to ensure our systems can handle anticipated growth and unexpected traffic spikes.
- Troubleshoot complex technical issues across the entire technology stack, from application code to infrastructure.
- Drive the adoption of best practices in software development, system architecture, and infrastructure management.
- Collaborate with development teams to improve application reliability, performance, and observability through code reviews and guidance.
- Contribute to the on-call rotation and actively engage in identifying and addressing root causes of incidents.
- Stay up to date with industry trends, emerging technologies, and SRE best practices, and bring fresh id eas to the team.
Qualifications:
- Proficiency in at least one programming language Powershell , C# etc.
- Strong experience with cloud platforms (Azure) and containerization technologies (Docker, Kubernetes).
- Solid understanding of networking concepts, protocols, and security principles.
- Experience with configuration management tools and infrastructure-as-code practices.
- Familiarity with Azure monitoring and observability tools
- Ability to analyze complex systems and troubleshoot issues systematically.
- Excellent communication skills and ability to work collaboratively in a team-oriented environment.
- Prior experience with incident response, on-call rotations, and incident management is a plus.
- Relevant certifications such as DevOps Engineer, Azure Administration, or equivalent certifications are a plus.
Generic Managerial Skills
Digital : Microsoft Azure; Digital : Docker; PostgreSQL
Job Type: Full-time
Pay: $100,434.00 - $120,868.00 per year
Benefits:
- 401(k)
- Dental insurance
- Health insurance
Experience level:
- 10 years
- 11+ years
- 7 years
- 8 years
- 9 years
Schedule:
Experience:
- Azure: 1 year (Preferred)
- AWS: 1 year (Preferred)
- Kubernetes: 1 year (Preferred)
Ability to Commute:
- Bellevue, WA 98004 (Preferred)
Ability to Relocate:
- Bellevue, WA 98004: Relocate before starting work (Required)
Work Location: In person