# Agentic Engineer - Voice AI

> WATI.io · Shenzhen, China · — · Posted 2026-04-30

**Workplace:** on_site

**Department:** Engineering

## Description

**About Wati** 

Started as a WhatsApp team inbox in 2020, Wati has evolved into an AI-powered customer engagement platform that goes beyond a single channel. Designed for businesses that sell, support, and grow through conversations, Wati observes customer intent in real time, decides the next best revenue action, and executes it across marketing, sales, and support — on WhatsApp, Instagram, Facebook, TikTok, SMS, and more.

Trusted by over 16,000 customers across 190+ countries, Wati simplifies complex operations and business conversations with a unified inbox, no-code automation, and our intelligent AI layer, Astra.

Proudly backed by Tiger Global, Sequoia Capital, DST Global, and Shopify, and recognised as a Premium Partner of Meta and Google.

### About the Role

We are looking for an Agentic Engineer – Voice AI to build and scale Wati's real-time voice AI capabilities on WhatsApp.

In this role, you will develop the systems that allow AI agents to listen, think, and speak in real time over WhatsApp voice calls. This includes building and optimizing the real-time media pipeline (WebRTC, LiveKit), integrating frontier AI models (OpenAI Realtime API, Google Gemini Live), and engineering the cascade architecture that connects speech recognition, language models, and speech synthesis into a seamless conversational experience.

You will work on latency-critical infrastructure where every millisecond matters — from audio transport and voice activity detection to model inference and text-to-speech delivery. You will also contribute to the broader AI agent stack, including tool calling, context management, and multi-turn conversation orchestration.

This role sits at the intersection of real-time communication systems, AI model integration, and conversational voice experiences.

**What You Will Do**

• Design, build, and optimize real-time voice AI pipelines — from WebRTC media transport to LLM inference and speech synthesis

• Integrate and orchestrate frontier AI models including OpenAI Realtime API, Google Gemini multimodal live, and cascade architectures (ASR → LLM → TTS)

• Build and maintain the media infrastructure: LiveKit-based audio routing, Opus codec handling, RTP/RTCP transport, and voice activity detection

• Develop agent capabilities for voice interactions — tool calling, function execution, context engineering, and multi-turn conversation management

• Optimize end-to-end latency across the voice pipeline, from audio capture to AI response playback

• Collaborate with product and platform teams to deliver production-grade voice AI experiences on WhatsApp

• Ensure reliability, performance, and scalability of voice AI infrastructure serving customers across 190+ countries

## Requirements

• 3+ years of software engineering experience, with strong backend development skills (Go or Python preferred)

• Experience with real-time communication technologies: WebRTC, RTP/RTCP, audio codecs, or media server infrastructure

• Familiarity with AI/LLM integration — model APIs, tool calling, prompt engineering, or agent orchestration

• Experience with or strong interest in speech technologies: ASR, TTS, voice activity detection, or audio processing pipelines

• Understanding of distributed systems, microservices, and cloud-native architectures (GCP preferred)

• Comfortable working with PostgreSQL, Redis, and pub/sub messaging systems

• Strong problem-solving ability and ability to work in fast-paced, ambiguity-rich environments

• Ability to debug complex, latency-sensitive systems by reading code, traces, and real-time metrics

**Nice to Have**

• Hands-on experience with OpenAI Realtime API, Gemini multimodal live, or similar real-time AI model APIs

• Experience with AI agent frameworks (Dify, LangChain, CrewAI, etc.)

• Familiarity with MCP (Model Context Protocol) or other agent integration standards

• Contributions to open-source projects in the AI or real-time communication space

**Behavioural Expectations**

• Strong ownership and bias for action in a fast-moving environment

• Proactive, self-driven, and comfortable working across teams to drive outcomes

• AI-native mindset, actively using AI tools in daily engineering workflow and experimenting with agent frameworks and emerging technologies

• Curiosity-driven — eager to push the boundaries of what voice AI can do in real business contexts

## Apply

[Apply at WATI.io](https://apply.workable.com/wati-dot-i-o/j/C0F841D372/apply)

---
Powered by [Workable](https://www.workable.com)
