Note: The job is a remote job and is open to candidates in USA. GitLab is an open-core software company that develops a comprehensive AI-powered DevSecOps Platform used by over 100,000 organizations. The Staff Engineer role within the GitLab Operate team focuses on leading the technical direction for self-managed deployment strategies, emphasizing zero-downtime upgrades and operational excellence. This high-impact position involves architecting and implementing systems that enable organizations to deploy and operate GitLab reliably in their own infrastructure.
Responsibilities
- Define the technical vision for GitLab's cloud-native deployment and upgrades future, balancing operational simplicity, customer needs, and engineering constraints
- Lead the design and implementation of the new tooling, including Operator(s), enabling automated lifecycle management and zero-downtime upgrades
- Architect upgrade orchestration systems that safely coordinate complex multi-component upgrades across databases, application services, and auxiliary components
- Establish operational maturity standards and guidance for new services being integrated into GitLab's deployment tooling and empowering development teams for the end-to-end of their components
- Drive technical decisions around service integration patterns, deployment models, and operational interfaces
- Design production-grade Kubernetes Operators that aims to reliable reconciliation logic for complex stateful applications
- Design and implement upgrade orchestration that handles database migrations, rolling deployments, compatibility checks, and rollback capabilities
- Develop tooling and automation to reduce the operational complexity of running GitLab at scale
- Create integration frameworks that enable development teams to ship new services with standardized deployment patterns
- Maintain and evolve GitLab Helm Charts to support both simple and complex deployment topologies
- Contribute to safe database migration strategies for zero-downtime upgrades across PostgreSQL and other stateful components
- Implement compatibility layers that enable incremental upgrades without requiring simultaneous updates across all components
- Design and contribute to build validation and pre-flight check systems that detect potential upgrade issues before they impact production
- Partner with development teams to define integration requirements for new services and features
- Collaborate with GitLab Dedicated and Gitlab.com SRE teams to align deployment patterns and operational practices
- Work with Product Management to translate customer needs into technical requirements
- Mentor and guide other engineers on the team, establishing technical standards and best practices
- Create technical documentation and runbooks that enable customer success and support teams
- Define and implement observability standards for self-managed deployments, including metrics, logging, and alerting
- Build automated testing frameworks that validate deployment and upgrade scenarios across reference architectures
- Establish performance benchmarks and capacity planning guidance for different deployment scales
- Design resilience patterns for handling failures during upgrades and operations
- Contribute to incident response and post-mortems for self-managed deployment issues
Skills
- 8+ years of software engineering experience with at least 3+ years in platform engineering or infrastructure roles
- Expert-level Go proficiency (Ruby and Rails as a plus) with demonstrated ability to work in large, complex codebases
- Production Kubernetes experience, including: Building and maintaining Kubernetes Operators, Designing Helm charts for complex stateful applications, Understanding of Custom Resource Definitions (CRDs), admission controllers, and controller patterns, Experience with stateful workloads, persistent volumes, and storage classes
- Cloud-native architecture experience, including service mesh, observability stacks, and infrastructure as code
- Experience shipping production software that customers install and operate in their own infrastructure
- Understanding of Linux systems, including package management, systemd, and system-level debugging
- Experience building or maintaining Operators for complex stateful applications (databases, message queues, etc.)
- Ruby on Rails expertise and understanding of Rails application architecture
- Infrastructure automation using Terraform, Ansible, or similar tools
- Background in Site Reliability Engineering or DevOps with production on-call experience
- Understanding of compliance and security requirements for enterprise software deployments
- Experience with observability platforms
- Open source contribution history, particularly in infrastructure or deployment tooling
Benefits
- Flexible Paid Time Off
- Team Member Resource Groups
- Equity Compensation & Employee Stock Purchase Plan
- Growth and Development Fund
- Parental leave
- Home office support
Company Overview
GitLab is a web-based Git repository manager that offers a variety of features for software development teams. It was founded in 2014, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is http://about.gitlab.com.