Site Reliability Engineer (SRE) – FedRAMP

Location: New York, USA

Description

We’re growing and looking to hire a Site Reliabability Engineer who embodies our core values: People First, Customer Obsession, Strive for Excellence, and Integrity.

Claroty’s Public Sector practice is rapidly expanding to secure the mission-critical systems that our society’s safety and stability depend on. We are looking for mission-driven professionals who want to join a high-growth team dedicated to protecting critical infrastructure and ensuring essential services remain resilient and uninterrupted.

Requirements:

About the Role: 

We are seeking a skilled Site Reliability Engineer (SRE) to support and maintain Claroty’s FedRAMP-compliant deployment in AWS GovCloud for public sector customers. The SRE will be responsible for ensuring high availability, security, and compliance of cloud-based environments while driving automation, monitoring, and incident response best practices.

As a DevOps SRE, your impact will be:

  • AWS GovCloud Operations: Manage and optimize Claroty’s cloud-based infrastructure in AWS GovCloud, ensuring FedRAMP compliance and high availability.
  • Reliability & Performance: Monitor and enhance system performance, scalability, and reliability through observability tools, automation, and best practices.
  • Security & Compliance: Implement and maintain security controls aligned with FedRAMP, NIST 800-53, and other federal cybersecurity standards.
  • Infrastructure as Code (IaC): Develop and manage infrastructure automation using Terraform and Ansible.
  • CI/CD & Automation: Enhance DevSecOps pipelines, automate deployments, and improve system resilience through tools like GitLab CI/CD, Jenkins, and Kubernetes.
  • Incident Response & Monitoring: Implement and manage monitoring solutions (Prometheus, Grafana, ELK Stack), respond to incidents, and conduct post-mortems.
  • Networking & Security: Configure and maintain VPCs, VPNs, security groups, and firewalls in AWS GovCloud, ensuring compliance with FedRAMP requirements.
  • GOV Production Gatekeeper: Manage rollout strategy for new technologies and oversee their execution to ensure minimal disruption to existing systems.
  • GOV Production On-Call: Act as the first line of response for critical incidents, assessing issues, triaging, and coordinating with the team to prevent further problems and swiftly restore services.
  • Monitor Production Performance and Degradation: Monitor system performance metrics closely and detect any degradation early to prevent outages and disruptions.
  • Production Maintenance: Conduct regular infrastructure upgrades to accommodate changes, developments, and advancements in the technological landscape.
  • Manage Release Flow: Oversee the release of updates and new functionalities, ensuring a seamless transition while handling any potential negative impacts on production.
  • Collaboration: Work closely with DevOps, security teams, developers, and federal stakeholders to maintain a compliant and secure cloud environment.