# CUDA Engineering Expert

> Weekday AI · United States (Remote) · Part-time · Posted 2026-06-24

**Workplace:** remote

**Department:** AI Training

## Description

**This role is for one of our clients**

**Compensation: $80-$100 per hour  
  
**We are seeking GPU kernel optimization experts to contribute to a project with a leading AI lab. This opportunity is designed for freelancers with strong C++ skills, practical GPU programming experience, and the ability to improve kernel performance using profiler-guided analysis. You’ll help evaluate, optimize, and reason about GPU kernels across modern hardware environments. This is a contract-based opportunity for specialists who enjoy squeezing performance out of modern GPU architectures.

## Requirements

### **Key Responsibilities**

-   Analyze and optimize GPU kernels for performance, efficiency, and hardware utilization
-   Use profiler metrics such as L2 cache hit rate, L2 throughput, occupancy, and related signals to guide kernel improvements
-   Review GPU kernel implementations and identify bottlenecks without requiring extensive background in the underlying algorithms
-   Write, modify, and reason about C++17, Python, and GPU programming code
-   Apply CUDA, HIP, shader programming, or related kernel programming expertise to improve performance outcomes
-   Document optimization decisions clearly, including when specific profiler metrics are or are not useful

### **Ideal Qualifications**

-   Available to work at least 20 hrs/wk
-   Fluent in core C++ features through C++17
-   Working knowledge of Python and Git
-   Fluent in at least one GPU programming model, such as CUDA, HIP, Slang, HLSL, GLSL, or related kernel programming
-   At least 1 year of professional or graduate-level research experience working with GPUs
-   Strong understanding of GPU profiler performance metrics and how to use them to optimize kernels
-   Ability to optimize GPU kernels without needing deep prior context on every algorithm
-   Experience with CUDA, HIP, CUDA C++ Core Libraries, inline PTX assembly, or tensor core-level optimization is a plus
-   Experience optimizing kernels for NVIDIA Blackwell hardware is a plus
-   Familiarity with NSight Compute is a plus
-   Prior experience with GPU hardware organizations such as NVIDIA, AMD, or Qualcomm is a plus
-   Open-source contributions related to GPU kernel optimization are a plus

### **4\. Application Process**

-   Submit your resume or relevant technical background to get started
-   Qualified applicants may be asked to complete a brief technical assessment or submit additional information

We consider all qualified applicants without regard to legally protected characteristics and provide reasonable accommodations upon request.

### Contract and Payment Terms

-   You will be engaged as an independent contractor.
-   This is a fully remote role that can be completed on your own schedule.
-   Projects can be extended, shortened, or concluded early depending on needs and performance.
-   Your work will not involve access to confidential or proprietary information from any employer, client, or institution.
-   Payments are weekly on Stripe or Wise based on services rendered.
-   Please note: We are unable to support H1-B or STEM OPT candidates at this time.

## Apply

[Apply at Weekday AI](https://apply.workable.com/weekday-1/j/4157E8B4F6/apply)

---
Powered by [Workable](https://www.workable.com)
