# Site Reliability Engineer-Ads / Monetization Platform

> Two95 International Inc. · Kuala Lumpur, Malaysia · — · Posted 2026-04-20

**Workplace:** on_site

## Description

### Role Summary

As a Site Reliability Engineer (SRE), you will build and operate highly available, globally distributed advertising/monetization services. You will improve reliability, scalability, and operability through automation, observability, incident management, and sound engineering practices.

### Key Responsibilities

-   Own reliability across the service lifecycle: design reviews, capacity planning, launch, deployment, operations, and continuous improvement.
-   Build and operate highly available services across multiple regions/data centers; improve resilience, latency, and scalability.
-   Develop automation and tooling to reduce toil (deployment, remediation, runbooks, self-healing) using scripting and software engineering best practices.
-   Define and implement SLOs/SLIs/SLAs; create dashboards and alerting to track service health (availability, latency, errors, saturation).
-   Lead sustainable incident response: triage, mitigation, root-cause analysis (RCA), and blameless postmortems with actionable follow-ups.
-   Collaborate with software engineering, security, and compliance stakeholders to meet data governance and regulatory requirements.

### Must-have Qualifications

-   3+ years of experience in SRE, DevOps, systems engineering, or production operations for large-scale services.
-   Strong coding skills in one language: Python or Go or C++ (Java acceptable).
-   Solid Linux/Unix fundamentals: processes, memory/CPU, filesystems, permissions, and troubleshooting.
-   Networking fundamentals in cloud environments: TCP/IP, DNS, HTTP/HTTPS, load balancing, basic security concepts.
-   SQL proficiency and experience with data workflows/ETL is a plus for ads/analytics-related systems.
-   Strong communication, ownership mindset, and ability to work effectively across global teams.

### Preferred Qualifications

-   Experience supporting advertising, recommendation, or high-traffic consumer internet platforms.
-   Hands-on experience with cloud platforms (AWS/GCP/Azure) and infrastructure-as-code (Terraform/Ansible).
-   Experience with containers and orchestration (Docker, Kubernetes).
-   Observability experience with tools such as Prometheus, Grafana, ELK/Splunk, OpenTelemetry.
-   Experience operating large data systems (streaming, distributed storage/compute) and performance tuning.

## Apply

[Apply at Two95 International Inc.](https://apply.workable.com/two95-international-inc-3/j/D3D2C6E214/apply)

---
Powered by [Workable](https://www.workable.com)
