Mindrift

Freelance AI Evaluation Engineer (Python/Full-Stack)

Mindrift
Location
Job Type
Contract
Salary
$40 per hour
Posted
4/9/2026
Career Level
Mid-Senior Level
Qualification
Degree in Computer Science, Software Engineering or related fields
Remote5+ years in software development6 views

Job Description

What this opportunity involves

  • You’ll create challenging coding test cases that push AI coding systems to their limits:
  • Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources
  • Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks
  • Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required)
  • Analyze AI failures to understand what the model struggles with vs. what it masters
  • Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria

What we look for

  • Degree in Computer Science, Software Engineering or related fields
  • 5+ years in software development, primarily Python (pytest, async/await, subprocess, file operations)
  • Background in Full-Stack development, with an equal focus on building React-based interfaces and robust Back-end systems
  • Experience writing tests (functional, integration – not just running them)
  • Docker containers (running evaluations locally in containers)
  • CI/CD understanding (GitHub Actions as a user: triggers, labels, reading results)
  • English proficiency - B2

How it works

  • Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Effort estimate

Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Compensation

On this project, contributors can earn up to $40 per hour equivalent, depending on their level and pace of contribution.Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.

Get notified of similar jobs

We'll send you an email when jobs similar to "Freelance AI Evaluation Engineer (Python/Full-Stack)" are posted.

Keyword: Freelance AI Evaluation Engineer (Python/Full-Stack)Location: Kuwait

No spam ever. Unsubscribe with one click anytime. By subscribing, you agree to our privacy policy.

Related Jobs You Might Like

View all jobs →

Antila: Menu Annotation Contributor Arabic (Gulf/Kuwait)

Welo Global

KuwaitRemote
Contract
$18.00/hour

Overview: Review restaurant menu items written in Arabic. Classify each item using a standardized taxonomy in Americanized English. Enhance food category recognition across diverse markets. What You Will Do: Read: Review each menu item's name, category, and description written in Arabic. Confirm: Check the item photo (if available) to verify the food type. Classify: Assign the most accurate Dish Label from an English-language taxonomy dropdown. Label: Select a Course Label in English (e.g., Appetizer, Main Course, Side Dish, Dessert, Beverage, Other). Add Notes: Provide a brief note for items that seem unclear, ambiguous, or unusual. Project Details: Start Date: ASAP Duration: 1–2 hours total (up to 10 hours maximum) Job Type: Freelance Location: Remote (Kuwait) Language: Arabic Pay Rate: $18.00/hour Requirements: Native Arabic Speaker: Based in Kuwait. Strong English Comprehension: Confidently navigate and select from taxonomy options written in Americanized English. Local Food Knowledge: Familiarity with Arabic restaurant menus and cuisine terminology. Tech-Savvy: Comfortable using a web browser for structured tasks. Detail-Oriented: Provide consistent, accurate, and high-quality work. Availability: Able to complete the assignment within 1–2 business days after receiving access. Why Join Welo Data? Limitless Flexibility: Project-based opportunities that fit your availability. Limitless Growth: Optional access to AI and Large Language Model workshops. Limitless Support: Be part of a global contributor community with responsive guidance and support. Real Impact: Apply your expertise to influence the AI systems shaping the future.

View Details →
Toloka Annotators

AI Trainer - Freelance Data Annotator

Toloka Annotators

KuwaitRemote
Contract
10k-15k USD (Estimated)

About the Role Annotation is what helps AI make sense of the world. As an annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses - when projects are available. Responsibilities: Carefully review provided data (text, images, or videos) Label or classify content based on project guidelines Identify and flag factually incorrect, sensitive, inappropriate, or unclear material Important note: This is project-based work. Tasks are available only when projects are active. You may be invited to one or more projects depending on your profile and current opportunities. Each project has its own compensation level based on scope and expertise required. On this project, AI trainers earn up to $17 per hour equivalent. Why this freelance opportunity might be a great fit for you? Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments. Work on advanced AI projects and gain valuable experience that enhances your portfolio. Influence how future AI models understand and communicate in your field of expertise.

View Details →
Toloka Annotators

Freelance Annotator (English) - AI Trainer

Toloka Annotators

KuwaitRemote
Contract
USD 10-20/hour (Estimated)

About the Role Annotation is what helps AI make sense of the world. As an annotator, you may be invited to take part in online projects such as rating AI-generated content, evaluating factual accuracy, or comparing responses - when projects are available. Responsibilities: Carefully review provided data (text, images, or videos) Label or classify content based on project guidelines Identify and flag factually incorrect, sensitive, inappropriate, or unclear material Important note: This is project-based work. Tasks are available only when projects are active. You may be invited to one or more projects depending on your profile and current opportunities. Each project has its own compensation level based on scope and expertise required. On this project, AI trainers earn up to $17 per hour equivalent. Why this freelance opportunity might be a great fit for you? Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments. Work on advanced AI projects and gain valuable experience that enhances your portfolio. Influence how future AI models understand and communicate in your field of expertise.

View Details →
HomeJobsSign In