# Backend Software Engineer (ML Infra)

> Rockstar · San Francisco, United States · — · Posted 2025-12-22

**Workplace:** on_site

## Description

Rockstar is recruiting for a fast-growing startup that is building the AI backbone for the next generation of intelligent products. They help fast-growing AI startups design, fine-tune, evaluate, deploy, and maintain specialized models across text, vision, and embeddings. Think of them as “AWS for AI models”—not data or raw compute, but a full-stack backend for fine-tuning, reinforcement learning, inference, and long-term model maintenance. Their customers are Series A–C AI companies building enterprise-grade products. Their promise is simple: they make your AI system better.

They are hiring a Backend Software Engineer (ML Infrastructure) to help design, build, and scale the core systems that power large-scale model training and deployment.

The candidate will work on distributed training pipelines, cloud-native infrastructure, and internal developer platforms that support fine-tuning, reinforcement learning, and inference at scale. This role sits at the intersection of backend engineering and ML systems—the candidate will collaborate closely with ML engineers while owning production-grade infrastructure.

This is an ideal role for an early-career engineer who wants to work on real distributed systems, GPU workloads, and modern ML infrastructure—not dashboards or CRUD apps.

**What You’ll Do**

Build & Scale Core Infrastructure

\- Design and implement backend systems that support large-scale ML workloads, including fine-tuning and reinforcement learning.

\- Build distributed training and inference pipelines that are efficient, fault-tolerant, and observable.

\- Develop internal developer tools and platforms that make it easier for ML engineers to train, evaluate, and deploy models.

Cloud & Systems Engineering

\- Work on cloud-native systems using containers and orchestration (e.g., Kubernetes).

\- Optimize systems for performance, reliability, and cost efficiency, especially for GPU-heavy workloads.

\- Implement monitoring, logging, and observability for long-running training jobs and production services.

Collaborate with ML Engineers

\- Partner closely with ML engineers to support evolving model architectures, training workflows, and evaluation needs.

\- Translate ML requirements into scalable backend and infrastructure solutions.

**Who You Are**

Required

\- 1–3 years of backend engineering experience, ideally working on production systems.

\- Strong fundamentals in distributed systems, networking, and backend architecture.

\- Experience building systems that scale under real load.

\- Comfortable working in Python and/or Go (or similar backend languages).

\- Excited to work on-site in San Francisco with a fast-moving early-stage team.

Strongly Preferred

\- Experience with or exposure to ML infrastructure or ML platforms.

\- Familiarity with GPU workloads, training pipelines, or inference systems.

\- Experience with containerization and orchestration (Docker, Kubernetes).

\- Contributions to or deep familiarity with ML infrastructure libraries such as:

  - Ray

  - vLLM

  - SGLang

  - or similar distributed ML systems

Bonus

\- Computer science background from a top-tier program or equivalent demonstrated excellence.

\- Open-source contributions, research projects, or side projects in systems or ML infrastructure.

\- A track record of high ownership and technical curiosity.

## Apply

[Apply at Rockstar](https://apply.workable.com/rockstar-3/j/F8E3C78E76/apply)

---
Powered by [Workable](https://www.workable.com)
