# Senior Operations Expert FT- SH 高级运维专家 (全职) - 上海

> Flowith · Shanghai, China · Full-time · Posted 2026-03-09

**Workplace:** on_site

**Department:** Development

## Description

**Role Overview**

You are the "architect" and "guardian" of Flowith’s global production environment. In this role, you are not just a firefighter putting out outages, but the cornerstone supporting exponential business growth. You will master the Cloudflare ecosystem and mainstream global cloud infrastructure to design and implement high-concurrency, low-latency distributed architectures. Through extreme performance optimization and a relentless pursuit of automation, you will ensure millions of global users always experience silky-smooth and stable AI interactions.

**Key Responsibilities**

-   Global Architecture Implementation: Design and manage cross-platform cloud-native architectures, driving multi-region deployment, elastic scaling, canary releases, and rapid rollbacks to ensure the efficient operation of global distributed applications.
-   Traffic & Performance Optimization: Lead the architectural design of managed caching and asynchronous messaging capabilities to seamlessly handle hot caches, task decoupling, and traffic spikes.
-   High Availability & Continuity: Build and continuously optimize the observability system (SLI/SLO and alert governance). Develop and drill backup/recovery, disaster recovery switching, and emergency response mechanisms to defend the baseline of business continuity.
-   Technical Vision & Empowerment: Participate in tech stack selection and architecture reviews for core business features, finding the optimal balance between reliability, security, cost, and maintainability.

-   全球化架构落地：设计并管理跨平台云原生架构，推进多地域部署、弹性扩缩容、灰度发布与快速回滚，保障全球分布式应用的高效运行。
-   流量与性能优化：主导托管式缓存与异步消息能力的架构设计，从容应对热点缓存、任务解耦与流量削峰。
-   高可用与连续性保障：建设并持续优化可观测性体系（SLI/SLO与告警治理），制定并演练备份恢复、容灾切换与应急响应机制，捍卫业务连续性底线。
-   技术前瞻与架构赋能：参与核心业务的技术选型与架构评审，在可靠性、安全性、成本与可运维性之间找到最优解。

## Requirements

-   You build systems that never sleep and automate everything you touch.
-   Hardcore Operations Foundation: 5+ years of SRE/DevOps/Operations experience with battle-tested experience in systems serving millions of users. Solid foundation in Linux and networking (TCP/IP, DNS, HTTP/HTTPS, TLS), and complex troubleshooting skills.
-   Cloud-Native & Edge Master: Deep understanding and proficiency in the Cloudflare ecosystem (CDN/WAF/DNS/Edge Computing) and resource governance of mainstream overseas cloud infrastructure (compute, network, load balancing, storage, managed databases).
-   Automation & Monitoring Enthusiast: Proficient in building and maintaining Prometheus + Grafana monitoring systems. Master of Terraform (or similar IaC) and mainstream CI/CD toolchains. Ability to write handy operational tools using Shell/Python/Go.
-   Architectural Vision: Deep understanding of managed cloud caching and messaging systems (Serverless Redis, queues/event-driven architectures), and hands-on experience in security operations (least privilege, key management, access control, auditing).
-   Bonus: Experience in deploying underlying infrastructure for AI applications, or a strong passion for exploring how Agents/LLMs can empower intelligent operations (AIOps).

**需要你：**

-   运维经验：5 年以上 SRE/DevOps/运维经验，曾在百万级/千万级用户规模的系统中身经百战，具备扎实的 Linux 与网络基础（TCP/IP、DNS、HTTP/HTTPS、TLS）及复杂故障排查能力。
-   云原生与边缘计算：深入理解并熟练使用 Cloudflare 生态（CDN/WAF/DNS/边缘计算），具备海外主流云基础设施（计算、网络、负载均衡、存储、托管数据库）的资源治理经验。
-   自动化与监控：熟练搭建与维护 Prometheus + Grafana 监控体系；精通 Terraform（或同类 IaC）与主流 CI/CD 工具链，能用 Shell/Python/Go 编写趁手的运维平台工具。
-   架构视野：深入理解托管式云缓存与消息系统（Serverless Redis、队列/事件驱动），具备安全运维实践经验（最小权限、密钥管理、访问控制、审计）。
-   加分项：对 AI 应用的底层基础设施部署有经验，或热衷于探索如何利用 Agent/大模型赋能智能运维（AIOps）。

## Benefits

-   **Workspace, Culture & Lifestyle**

-   **Awesome Teammates:** Work alongside a kind, creative, and hardworking crew of occasional "geeks" and visionaries.
-   **Building the AGI Future:** Participate in the in-house development of rapidly evolving AI agents and explore the future of AGI interactive interfaces.
-   **Cool Offices in SH & SF:** Enjoy our ultra-open workspaces with the ultimate freedom to seamlessly switch between our Shanghai and San Francisco locations.
-   **Pet-Friendly Workplace:** Bring your furry friends to work! Come play with our resident Orange Tabby and Golden Retriever Mix, or bring your own pets to hang out.
-   **Island Hackathons:** Join our annual internal hackathons, where we select a new city or country each year for innovative coding sessions and team bonding.
-   **Free AI Tools & Tech Gear:** Enjoy free, unlimited access to cutting-edge AI tools, plus the latest tech equipment like Apple Vision Pro and FPV drones.
-   **Tech Events:** Regularly participate in top-tier global tech meetups and innovation showcases.
-   **Parties & Events:** Celebrate with monthly birthday bashes and annual milestone parties
-   **Free Snacks & Drinks:** Stay fueled with an endless supply of your favorite beverages and unlimited complimentary snacks.

-   **Work Arrangements**

-   **Flexible Working Hours:** Customize your schedule by arriving at the office between 10 AM and 1 PM for a standard 8-hour workday, 5 days a week.
-   **Remote Work & Care:** Embrace a supportive hybrid work model, featuring 1 additional Work-From-Home (WFH) day per month exclusively for female employees.

-   **Comprehensive Benefits Package**

-   **Competitive Compensation:** Earn an above-market salary structure with an optional equity/stock options package.
-   **Wellness Program:** Take care of your body and mind with free gym access and monthly on-site professional massages.
-   **Exclusive Swag & Perks:** Receive holiday surprise gift boxes, premium custom company apparel (T-shirts, hoodies, and jackets), and occasional exclusive internal brand discounts.

## Apply

[Apply at Flowith](https://apply.workable.com/flowith/j/2DFB0C493D/apply)

---
Powered by [Workable](https://www.workable.com)
