Pluriscale - Senior Site Reliability Engineer

We are seeking an experienced Site Reliability Engineer to own and optimize our cloud infrastructure, deployment pipelines, and platform reliability. The ideal candidate will combine deep infrastructure expertise with strong programming abilities to build and maintain robust, scalable systems.

Core Responsibilities

Design, implement, and maintain our cloud infrastructure using Infrastructure as Code principles
Own and enhance our CI/CD pipelines and deployment processes
Manage and optimize Kubernetes clusters across multiple environments
Develop automation tools and scripts to improve operational efficiency
Monitor system performance and implement improvements
Lead incident response and post-mortem analysis
Support customer deployments across cloud and on-premises environments

Required Skills & Experience

5+ years of experience in Site Reliability Engineering or DevOps roles
Strong expertise in Infrastructure as Code using Terraform
Advanced knowledge of Kubernetes and container orchestration
Experience with Helm charts and FluxCD for GitOps workflows
Proficiency in multiple programming languages, particularly Python, Go, and Rust
Strong understanding of CI/CD principles and tools
Experience with monitoring, logging, and observability tools

Preferred Qualifications

Experience with multiple cloud providers (AWS, GCP, Azure)
Knowledge of on-premises infrastructure management
Experience supporting enterprise customer deployments
Background in security best practices and compliance
Contributions to open-source projects
Experience with service mesh technologies
Database administration experience

Key Competencies

Self-directed work ethic with excellent problem-solving skills
Strong documentation and technical writing abilities
Excellent communication skills for cross-team collaboration
Ability to manage multiple priorities in a fast-paced environment
Experience with on-call rotations and incident management

Technical Environment

Infrastructure: Kubernetes, Terraform, Helm
GitOps: FluxCD
Programming: Python, Go, Rust
Cloud Platforms: Multiple major cloud providers
CI/CD: GitHub & Self-Developed Tooling
Monitoring: Grafana & Prometheus

What We Offer

Opportunity to shape and improve critical infrastructure
Work with modern cloud-native technologies
Autonomy in technical decisions
Collaborative environment with talented engineers
Professional development opportunities

The ideal candidate will be passionate about infrastructure automation, system reliability, and building robust platforms. They should be comfortable working independently while maintaining strong relationships with team members and customers.