We are seeking an experienced Site Reliability Engineer to own and optimize our cloud infrastructure, deployment pipelines, and platform reliability. The ideal candidate will combine deep infrastructure expertise with strong programming abilities to build and maintain robust, scalable systems.
Core Responsibilities
- Design, implement, and maintain our cloud infrastructure using Infrastructure as Code principles
- Own and enhance our CI/CD pipelines and deployment processes
- Manage and optimize Kubernetes clusters across multiple environments
- Develop automation tools and scripts to improve operational efficiency
- Monitor system performance and implement improvements
- Lead incident response and post-mortem analysis
- Support customer deployments across cloud and on-premises environments
Required Skills & Experience
- 5+ years of experience in Site Reliability Engineering or DevOps roles
- Strong expertise in Infrastructure as Code using Terraform
- Advanced knowledge of Kubernetes and container orchestration
- Experience with Helm charts and FluxCD for GitOps workflows
- Proficiency in multiple programming languages, particularly Python, Go, and Rust
- Strong understanding of CI/CD principles and tools
- Experience with monitoring, logging, and observability tools
Preferred Qualifications
- Experience with multiple cloud providers (AWS, GCP, Azure)
- Knowledge of on-premises infrastructure management
- Experience supporting enterprise customer deployments
- Background in security best practices and compliance
- Contributions to open-source projects
- Experience with service mesh technologies
- Database administration experience
Key Competencies
- Self-directed work ethic with excellent problem-solving skills
- Strong documentation and technical writing abilities
- Excellent communication skills for cross-team collaboration
- Ability to manage multiple priorities in a fast-paced environment
- Experience with on-call rotations and incident management
Technical Environment
- Infrastructure: Kubernetes, Terraform, Helm
- GitOps: FluxCD
- Programming: Python, Go, Rust
- Cloud Platforms: Multiple major cloud providers
- CI/CD: GitHub & Self-Developed Tooling
- Monitoring: Grafana & Prometheus
What We Offer
- Opportunity to shape and improve critical infrastructure
- Work with modern cloud-native technologies
- Autonomy in technical decisions
- Collaborative environment with talented engineers
- Professional development opportunities
The ideal candidate will be passionate about infrastructure automation, system reliability, and building robust platforms. They should be comfortable working independently while maintaining strong relationships with team members and customers.