# Senior Site Reliability Engineer (SRE)

> Salla · Makkah, Saudi Arabia (Hybrid) · Full-time · Posted 2026-01-21

**Workplace:** hybrid

**Department:** Technology

## Description

As a Senior SRE at Salla, you will lead reliability initiatives, handle complex incidents, improve platform performance, and guide engineering teams toward building resilient systems. You will also participate in the **on-call rotation** as part of our commitment to platform reliability.

**Reliability & Incident Management  
**

-   Lead high-severity incident response and drive post-incident reviews.
-   Troubleshoot complex issues across applications, infrastructure, and networks.
-   Improve MTTR through better monitoring, alerts, and diagnostic tooling.
-   Participate in the **on-call rotation** supporting production systems.

**Performance & Scalability**

-   Identify and resolve performance bottlenecks and scaling challenges.
-   Conduct load testing and capacity planning for high-traffic scenarios.

**Infrastructure & Operations**  

-   Enhance cloud-native infrastructure, deployment processes, and automation.
-   Improve resilience, fault-tolerance, and recovery mechanisms across systems.

**Observability**

-   Build and refine dashboards, alerts, metrics, logs, and traces.
-   Define SLIs/SLOs and improve visibility into system behavior.

**Tooling & Automation**  

-   Develop tools that reduce operational toil and increase reliability.
-   Contribute to infrastructure-as-code, CI/CD pipelines, and GitOps workflows.

**Collaboration**  

-   Work closely with engineering teams to ensure services are robust and production-ready.
-   Mentor engineers on reliability, debugging, and operational best practices.

**Bonus Skills**  

-   Background in large-scale, high-traffic systems.
-   Experience with fault-tolerant design, DR, and HA patterns.
-   Familiarity with SLOs, SLIs, and error budgets.

**Location Preference**  

-   Candidates located within **GMT 0 to +6** time zones are preferred to align with team collaboration and on-call coverage.

## Requirements

-   Strong experience with **Kubernetes**, **service mesh technologies**, and cloud platforms (**AWS, GCP, or Azure**).
-   Deep understanding of **Linux**, **networking**, **distributed systems**, and **load balancing**.
-   Hands-on experience with **Terraform** or similar Infrastructure-as-Code tools.
-   Experience with observability platforms such as **Prometheus, Grafana, Loki, Mimir, Elastic**, or equivalent.
-   Proficiency in scripting or programming languages such as **Bash, Python, or Go**.
-   Experience with **CI/CD pipelines** and **GitOps** practices.
-   Strong debugging, incident response, and performance analysis skills.

## Apply

[Apply at Salla](https://apply.workable.com/salla/j/8BBEC70032/apply)

---
Powered by [Workable](https://www.workable.com)
