# AI Evaluation Engineer - Mathematics & Algorithms

> Gramian Consulting Group · Colombia (Remote) · Contract · Posted 2026-04-27

**Workplace:** remote

**Department:** Partnerships

## Description

**About Us**

Gramian Consultancy is a boutique consultancy specializing in IT professional services and engineering talent solutions. With a strong background in software engineering and leadership, we help companies build high-performing teams by matching them with professionals who truly fit their needs.

**Role overview**

We are looking for a highly analytical and computationally strong professional with a solid research background in mathematics or quantitative fields.

In this role, you will design **advanced benchmark tasks for multi-agent AI systems**, focusing on complex mathematical reasoning, algorithmic problem-solving, and verifiable computational outputs. You will contribute by crafting challenging problems, building validation systems, and structuring tasks that require decomposition into coordinated sub-solutions.

**Commitments Required: 8 hours per day with an overlap of 4 hours with PST.**

**Employment type: Contractor assignment (no medical/paid leave)**

**Duration of contract: 4 weeks+**

**Location:** **Bangladesh, Brazil, Colombia, Egypt, Ghana, India, Indonesia, Kenya, Nigeria,Turkey, Vietnam**

**Interview: take home assessment (60min) + short interview**

### **Responsibilities**

-   Design and build **multi-agent benchmark tasks** requiring multi-step mathematical reasoning and algorithmic problem-solving
-   Create **complex, decomposable problems** across domains such as:

-   Competition mathematics
-   Numerical analysis
-   Combinatorial optimization
-   Statistical inference

-   Develop **verification scripts** to validate:

-   Numerical outputs (with tolerance thresholds)
-   Proof correctness and logical steps
-   Algorithmic outputs and constraints

-   Write **clear, structured problem statements** with precise notation and defined outputs
-   Design **task decomposition strategies** for parallel or multi-agent execution
-   Implement computational solutions and validation pipelines using Python
-   Work with containerized environments (Docker) for reproducibility and evaluation

## Requirements

-   5+ years in mathematics, quantitative research, or computational science
-   Strong Python skills for scientific computing (NumPy, SciPy, SymPy or similar)
-   Experience solving or designing **complex mathematical / algorithmic problems**
-   Ability to create **precise, verifiable outputs** (no subjective problems)
-   Experience with **mathematical proofs or formal reasoning**
-   Familiarity with **AI benchmarks or evaluation frameworks** (e.g., SWE-bench)
-   Comfortable working with **Docker environments**
-   Solid understanding of **numerical methods** (precision, convergence, error bounds)

## Apply

[Apply at Gramian Consulting Group](https://apply.workable.com/gramian/j/E91CA39C78/apply)

---
Powered by [Workable](https://www.workable.com)