Thales is a leader in digital security, providing identity management and data protection solutions. They are seeking a Site Reliability Engineer to ensure high service levels for their Telecommunication solution deployed in the public cloud, focusing on automation, reliability engineering, and incident management.
Responsibilities
- Design, build, and maintain scalable infrastructure using tools such as Terraform, Ansible, and Kubernetes
- Develop automated CI/CD pipelines via GitLab to reduce manual toil
- Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Manage 'Error Budgets' to balance the velocity of new features with the stability of the platform
- Participate in 24/7 on-call rotations to provide emergency response and perform deep-dive troubleshooting for production issues
- Conduct system performance analysis, identify bottlenecks, and perform capacity planning to ensure the infrastructure can handle growth and peak loads
- Implement and refine symptom-based alerting and comprehensive monitoring strategies using platforms like Datadog to ensure high visibility into system health
- Lead blameless postmortems after incidents to identify root causes and implement long-term technical fixes to prevent recurrence
- Partner with Cloud Security teams to implement security best practices, manage access controls, and respond to security breaches or vulnerabilities
- Interface with other stakeholders to define solution improvement plan
- You will have the ownership of solution service availability
Skills
- Engineer or equivalent
- At least 1 year experience
- Java development skill is required
- You are familiar with Public Cloud (GCP, AWS), containers and microservices (Docker, Kubernetes, Java), CI/CD and automation (Jenkins, Gitlab, Helm), NoSQL database
- Must have U.S. or Dual Citizenship and be able to obtain post-hire clearance from the Committee on Foreign Investments in the U.S. (CFIUS) and Department of Treasury
- GCP cloud architect certification is a plus
- You have already set up product monitoring and the underlying infrastructure
- You have development experience in a distributed systems and/or high availability context
- You are familiar with microservices development
- You participated in the definition of architectures, data structures, algorithms with performance, security, reliability constraints, etc
- Public cloud architect certification
- You are interested in aspects of Site Reliability Engineer: CI/CD, automation, monitoring and observability, and continuous improvement
- You are an accomplished, versatile and multi-tasking developer engineer
Benefits
- Elective Health, Dental, Vision, FSA/HSA, Voluntary Life and AD&D, Whole Group Life w/LTC, Critical Illness, Hospital Indemnity, Accident Insurance, Legal Plan, Identity Theft, and Pet Insurance
- Retirement Savings Plan after 30 days of employment with a company contribution and a match, and with no vesting period
- Company paid holidays and Paid Time Off
- Company provided Life Insurance, AD&D, Disability, Employee Assistance Plan, and Well-being Program
Company Overview
- Thales (Euronext Paris: HO) is a global leader in advanced technologies for the Defence, Aerospace, and Cyber & Digital sectors. It was founded in 1893, and is headquartered in Paris, Ile-de-France, FRA, with a workforce of 10001+ employees. Its website is http://www.thalesgroup.com.