# Machine Learning Engineer (Model Efficiency & Interpretability)

> Deeproute.ai · Fremont, United States · Full-time · Posted 2026-04-27

**Workplace:** on_site

## Description

We are looking for engineers who go beyond “training bigger models.”  
You will focus on **understanding what happens inside models**, improving **efficiency, reliability, and interpretability**—often **without relying on massive compute**.

**  
1\. Model Efficiency & Edge Optimization**

-   Design and optimize lightweight neural networks (e.g., ShuffleNet, EfficientNet) for **high parameter efficiency and real-time performance**.
-   Improve **latency, memory footprint, and throughput** under real-world constraints (on-device / real-time systems).
-   Apply and extend techniques such as **quantization, pruning, distillation, and operator-level optimization**.

**2\. Model Introspection**

-   Analyze **model weights, activations, and internal representations** to understand decision mechanisms.
-   Investigate **failure cases and error patterns**, especially under distribution shift or long-tail scenarios.
-   Develop tools or methods to **attribute model behavior** (e.g., neuron-level analysis, feature attribution, representation probing).
-   Study and improve robustness of models under transformations such as quantization or compression.

**3\. Quantization & Numerical Analysis**

-   Diagnose and mitigate performance degradation caused by **quantization or reduced precision**.
-   Analyze **weight/activation distributions** and sensitivity to precision changes.
-   Design improved quantization strategies to maintain accuracy under strict compute constraints.

**4\. Fine-grained Engineering & Debugging**

-   Dive deep into model execution to identify bottlenecks at the **kernel / operator / graph level**.
-   Build experiments to validate hypotheses about model behavior, rather than relying on brute-force scaling.
-   Maintain a strong focus on **measurable improvements** (latency, memory, stability, error rates).

## Requirements

**Core Requirements**

-   Strong foundation in deep learning and neural network architectures.
-   Hands-on experience with **model efficiency optimization** (quantization, pruning, distillation, etc.).
-   Experience working under **resource constraints** (edge devices, real-time systems, or low-latency services).

**Key Differentiator (Very Important)**

-   Demonstrated ability to **analyze model internals**, not just train models.
-   Experience with:

-   Weight / activation distribution analysis
-   Debugging model behavior beyond metrics
-   Understanding _why_ a model works or fails

**Preferred**

-   Experience with:

-   Model compression or deployment frameworks (TensorRT, ONNX, TVM, etc.)
-   Numerical stability / low-precision training
-   Interpretability or mechanistic analysis of neural networks

-   Prior work showing **deep investigation into model behavior**, not just scaling experiments.

## Apply

[Apply at Deeproute.ai](https://apply.workable.com/deeproute-dot-a-i/j/D1F667B1DA/apply)

---
Powered by [Workable](https://www.workable.com)