# Senior AI/ML Engineer - AI Systems Evaluation

> REAL DEV INC · Tel Aviv-Yafo, Israel · Full-time · Posted 2026-05-12

**Workplace:** on_site

**Department:** Engineering

## Description

**REAL** is building an AI Execution Platform for real estate organizations.

Today, the data required to run real estate is scattered across fragmented systems, leading to missed insights and preventable financial leakage.  
  
**REAL** transforms this complexity into connected intelligence and automated execution, enabling enterprises to operate with greater precision and confidence.  
  
**REAL** **Values**

-   **Ownership**: We take responsibility and move decisively.
-   **Clarity**: We simplify complexity to deliver meaningful impact.
-   **Accuracy**: Precision matters in everything we build.
-   **Velocity**: We work with urgency and intent.
-   **Partnership**: We collaborate closely with customers and teammates.

**Role Overview**

-   Own the systems that define, measure, and enforce AI quality at REAL.
-   Translate ambiguous model behavior into measurable signals, automated tests, and release gates.
-   Operate across evaluation design, tooling, and production integration.

**What You'll Do**

-   Design evaluation architectures (benchmarks, regression suites, coverage)
-   Build automated pipelines to run and score evals across models and prompts
-   Implement scoring systems (LLM-as-judge, rubrics, hybrid approaches)
-   Create and maintain golden datasets + edge-case suites
-   Develop internal tools for prompt testing, dataset generation, experiment tracking
-   Instrument systems for traces, outputs, and debugging
-   Detect regressions and enforce quality gates in CI/CD
-   Monitor model performance in production
-   Close the loop between eval insights and product improvements

## Requirements

**What We're Looking For**

-   3-6 years building production software, internal platforms, ML/data infrastructure, experimentation systems, or AI tooling
-   Strong backend and systems engineering fundamentals with hands-on applied AI experience
-   Strong Python, production-level systems experience
-   Built testing frameworks or validation systems end-to-end
-   Hands-on with LLMs / RAG / agent workflows
-   Understands eval methods (benchmarking, A/B, LLM-as-judge, HITL)
-   Experience with observability / logging / experiment tracking
-   Strong systems thinking (coverage, reliability, reproducibility)
-   Comfort with non-deterministic systems

**Nice to Have**

-   Experience with eval, tracing, observability, or experimentation tooling (one or more of the following: LangSmith, Braintrust, Phoenix, MLflow, OpenTelemetry, PostHog, custom eval stacks)
-   Familiarity with dataset/versioning workflows, HITL systems, and production AI observability systems
-   CI/CD integration for model evaluation
-   Background in search, retrieval, or document systems
-   Built internal platforms or developer tools
-   Experience working in startups and business driven environments

## Apply

[Apply at REAL DEV INC](https://apply.workable.com/real-dev-inc/j/2A25A7E20F/apply)

---
Powered by [Workable](https://www.workable.com)
