# SRE Manager / SRE Architect

> Qode · New York, United States (Hybrid) · Full-time · Posted 2026-06-03

**Workplace:** hybrid

## Description

**Job Description – SRE Manager / SRE Architect (Hands-on)**

**Location:** New York City, NY / Fort Mill, SC (Hybrid)

**Employment Type:** Full-Time / Contract

**Industry:** Financial Services

**Position Overview**

We are seeking a highly experienced and hands-on **Site Reliability Engineering (SRE) Manager / SRE Architect** to lead reliability, availability, performance, and release management initiatives across enterprise-scale applications and platforms. This role requires a strong blend of **SRE, DevOps, Release Management, Cloud Engineering, Automation, and Production Operations** expertise.

The ideal candidate will be deeply involved in designing and implementing reliability strategies, driving release governance, improving deployment processes, and ensuring operational excellence across cloud-native environments.

**LaunchDarkly experience is highly preferred but not mandatory.**

**Key Responsibilities**

**Site Reliability Engineering (SRE)**

-   Design and implement SRE best practices focused on reliability, scalability, performance, and availability.
-   Define and monitor SLIs, SLOs, and error budgets across critical applications and services.
-   Drive proactive monitoring, alerting, observability, and incident management processes.
-   Lead root cause analysis (RCA) efforts and implement preventive measures.
-   Improve system resiliency through automation, self-healing capabilities, and operational excellence.
-   Establish reliability standards across distributed systems and cloud platforms.

**Release Management**

-   Own and drive end-to-end release management processes across multiple environments.
-   Coordinate application releases across development, QA, UAT, staging, and production environments.
-   Develop release governance, release calendars, deployment strategies, rollback procedures, and change management processes.
-   Partner with development, QA, infrastructure, and business teams to ensure smooth production deployments.
-   Identify and mitigate release risks while minimizing downtime and business impact.
-   Implement deployment automation and continuous delivery best practices.

**DevOps & Automation**

-   Design and maintain CI/CD pipelines using modern DevOps tools.
-   Automate infrastructure provisioning, deployment, monitoring, and operational workflows.
-   Drive Infrastructure as Code (IaC) adoption using Terraform or similar technologies.
-   Support cloud-native architectures and containerized application deployments.
-   Partner with engineering teams to improve developer productivity and deployment velocity.

**Cloud & Platform Engineering**

-   Manage and optimize cloud infrastructure on AWS and/or Azure.
-   Support Kubernetes, container orchestration, and cloud-native application platforms.
-   Ensure platform scalability, security, compliance, and operational readiness.
-   Drive platform modernization initiatives and operational transformation efforts.

**Required Skills & Experience**

**Core SRE Skills**

-   15+ years of IT experience with strong focus on SRE, DevOps, Platform Engineering, or Production Support.
-   Extensive hands-on experience implementing SRE practices in enterprise environments.
-   Strong understanding of:
-   SLI/SLO/Error Budgets
-   Incident Management
-   Problem Management
-   Capacity Planning
-   Reliability Engineering
-   Observability & Monitoring

**Release Management**

-   Proven experience managing large-scale production releases.
-   Strong expertise in:
-   Release Planning
-   Release Governance
-   Change Management
-   Deployment Automation
-   Rollback Strategies
-   Production Readiness Reviews

**DevOps & Cloud**

-   Hands-on experience with:
-   AWS and/or Azure
-   Kubernetes (EKS, AKS, OpenShift preferred)
-   Docker
-   Terraform
-   GitHub Actions, Jenkins, Azure DevOps, GitLab CI/CD
-   Experience building and maintaining CI/CD pipelines.

**Monitoring & Observability**

-   Strong experience with:
-   Dynatrace
-   Datadog
-   Splunk
-   Prometheus
-   Grafana
-   ELK Stack
-   CloudWatch

**Scripting & Automation**

-   Experience with Python, Bash, PowerShell, or similar scripting languages.
-   Strong automation mindset with focus on operational efficiency.

**Nice to Have**

-   **LaunchDarkly end-to-end implementation experience**
-   Feature flag management and progressive delivery strategies.
-   Financial Services, Banking, or Wealth Management domain experience.
-   Experience leading SRE or DevOps transformation initiatives.
-   Cloud certifications (AWS, Azure, Kubernetes).

**Preferred Candidate Profile**

-   Strong hands-on SRE leader, not just a people manager.
-   Deep expertise in Release Management and Production Support.
-   Proven background in DevOps, Cloud Engineering, and Platform Reliability.
-   Ability to work with development, infrastructure, security, and business teams.

**Keywords**

**SRE, Site Reliability Engineering, Release Management, DevOps, Terraform, AWS, Azure, Kubernetes, Dynatrace, CI/CD, LaunchDarkly, Production Support, Incident Management, Reliability Engineering, Observability, Platform Engineering, Infrastructure Automation**.

## Apply

[Apply at Qode](https://apply.workable.com/qodeworld/j/62BD73EC3E/apply)

---
Powered by [Workable](https://www.workable.com)