# Research Scientist, Reinforcement Learning

> Deeproute.ai · Fremont, United States · — · Posted 2026-04-27

**Workplace:** on_site

## Description

We are building next-generation **end-to-end autonomous driving systems** powered by reinforcement learning.

You will work on applying RL in **closed-loop, safety-critical environments**, leveraging large-scale simulation and real-world driving data to improve safety, comfort, and robustness.

-   Train and deploy RL policies in **closed-loop driving environments**
-   Scale RL training using **massively parallel simulation systems**
-   Design and optimize reward functions for complex driving behaviors
-   Improve **sim-to-real transfer** for real-world robustness
-   Collaborate with cross-functional teams to integrate models into production systems

## Requirements

**Core Technical Skills**

-   Proficiency in modern RL algorithms: DQN, PPO, SAC, TD3, etc.
-   Proficiency in modern RLHF algorithms: PPO, DPO, GRPO, etc.
-   Hands-on experience training reward models and finetuning LLM/VLM/VLA
-   Knowledge of distributed RL training at scale
-   Proficiency with massively parallel simulation environments
-   Knowledge of sim-to-real transfer techniques and domain randomization
-   Proficiency in Python, comfortable with C++
-   Proficiency in deep learning frameworks such as PyTorch
-   Experience with distributed training frameworks (Ray, Horovod, etc.)
-   Knowledge of model optimization (quantization, pruning) and CUDA is a plus
-   Knowledge of traffic rules, driving behavior modeling

**Preferred Qualifications**

-   Publications in top-tier venues (ICML, NeurIPS, ICLR, CVPR, ICCV, ECCV, ICRA, IROS, etc.)
-   Open-source contributions to RL libraries or autonomous driving projects
-   Previous experience with LLM fine-tuning using RLHF
-   Knowledge of safe RL, interpretable AI, or robustness techniques
-   Familiarity with autonomous vehicle regulations and safety standards

## Apply

[Apply at Deeproute.ai](https://apply.workable.com/deeproute-dot-a-i/j/9AD9D79305/apply)

---
Powered by [Workable](https://www.workable.com)