Evaluation Scenario Writer - AI Agent Testing Specialist

Mindrift

Location

Kuwait,Kuwait

Job Type

Contract

Salary

Varies based on expertise, skills assessment, location, project needs, and other factors. Rates up to $40/hour. (Estimated)

Posted

1/23/2026

Career Level

Mid-Senior Level

Qualification

Bachelor's Degree in Computer Science or related field preferred

3+ years of software development experience74 views

Job Description

Crafting Effective AI Agent Evaluation Scenarios

As an Evaluation Scenario Writer, you'll play a crucial role in assessing the performance of AI agents. While each project involves unique tasks, contributors may:

Create structured test cases that simulate complex human workflows
Define gold-standard behavior and scoring logic to evaluate agent actions
Analyze agent logs, failure modes, and decision paths
Work with code repositories and test frameworks to validate your scenarios
Iterate on prompts, instructions, and test cases to improve clarity and difficulty
Ensure that scenarios are production-ready, easy to run, and reusable

Essential Skills for AI Agent Evaluation Scenario Writers

This opportunity is a good fit for software engineers open to part-time, non-permanent projects. Ideally, contributors will have:

3+ years of software development experience with a strong Python focus
Experience with Git and code repositories
Comfort with structured formats like JSON/YAML for scenario description
Understanding of core LLM limitations (hallucinations, bias, context limits) and how these affect evaluation design
Familiarity with Docker
English proficiency - B2

How to Contribute to AI Agent Evaluation with Scenarios

Here’s how it works:

Apply
Pass qualification(s)
Join a project
Complete tasks
Get paid

Tasks for this project are estimated to take 6-10 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Paid contributions, with rates up to $40/hour*. Fixed project rate or individual rates, depending on the project. Some projects include incentive payments. *Note: Rates vary based on expertise, skills assessment, location, project needs, and other factors. Higher rates may be offered to highly specialized experts. Lower rates may apply during onboarding or non-core project phases. Payment details are shared per project.

Get notified of similar jobs

We'll send you an email when jobs similar to "Evaluation Scenario Writer - AI Agent Testing Specialist" are posted.

Related Jobs You Might Like

View all jobs →

Chatbot Developer (WhatsApp, Telegram, Discord) - Freelance

Mindrift

KuwaitRemote

Contract

2,500-5,500 USD per month (Estimated)

Mindrift is looking for skilled Bot Developers (WhatsApp Business API, Telegram Bot API, Discord API) to join the Tendem project (https://tendem.ai/) and build conversational bots and messaging-platform integrations within our hybrid AI + human environment. In this role, as an AI Pilot – that's how we refer to this position at Mindrift – you'll collaborate with Tendem Agents that handle repetitive tasks, while you provide bot engineering expertise, conversational design judgment, and quality control to ensure bots are reliable, useful, and ready for real users. This part-time remote opportunity is ideal for professionals with hands-on experience building messaging bots, working with platform APIs and webhooks, and implementing conversational logic.What We DoThe Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.About the RoleThis is a freelance role for a Tendem project. As a Bot Developer, you'll design, build, and refine messaging bots for one or more messaging platforms, including WhatsApp, Telegram, Discord, Slack, and similar platforms — for use cases such as customer service, appointment booking, order taking, content delivery, moderation, and automated notifications.Key ResponsibilitiesBuild bots for one or more messaging platforms, such as WhatsApp (Business API / Cloud API), Telegram (Bot API), Discord, Slack and similar messaging platforms.Design and implement conversational flows, dialogue state, and fallback handling.Integrate bots with LLMs (OpenAI, Anthropic, or similar) for natural language responses where appropriate.Connect bots to backend services, databases, CRMs, and third-party APIs (booking systems, payment, content sources).Handle webhooks, rate limits, and platform-specific message formats (interactive messages, buttons, media, templates).Evaluate AI-generated bot code and refactor it for correctness, reliability, and graceful error handling.Implement logging, monitoring, and recovery so bots stay healthy in production.Requirements and BenefitsEducational qualificationsAt least 3 years of relevant experience backend, integration, automation, or bot development experience (required).Bachelor's or Master's Degree in Computer Science, Engineering, Information Technology, or related technical fields is a plus.Academic and/or Professional ExperienceCandidates should have a strong foundation in bot development, messaging platform integrations, and building reliable conversational workflows. We are looking for specialists who can design and maintain production-ready bots, work confidently with APIs, webhooks, and backend services, and refine AI-assisted output into stable, user-friendly experiences. Strong problem-solving skills, attention to detail, and the ability to work independently are essential.Technical Skills (Essential)At least 1 year of hands-on experience building bots for at least one major messaging platforms (WhatsApp, Telegram, Discord, Slack, or similar) is requiredStrong command of Python or Node.js for backend bot logic.Solid experience with REST APIs, webhooks, OAuth, and async request handling.Experience with relational or NoSQL databases for storing conversation state and user data.Familiarity with LLM APIs (OpenAI, Anthropic) and prompt design for conversational use is a strong plus.Understanding of platform-specific limits, message templates, and approval flows (e.g., WhatsApp template messages).Experience with hosting and deployment (Docker, serverless, VPS, or PaaS)Additional requirementsStrong attention to detail and commitment to bot reliability — no silent failures, no broken flows.Self-directed work ethic with the ability to design and ship complete bots independently.Portfolio or examples of bots you've built (required).English proficiency: Upper-intermediate (B2) or above (required).

View Details →

Hourly Paid Young Learner Assistant

British Council

Kuwait

Contract

3.5 KWD per hour

Role PurposeIn this role, you’ll work hand-in-hand with the class teacher to create an inspiring English learning environment that sparks curiosity and confidence. You’ll also play a key part in the British Council’s mission to build global connections and cultural understanding, making a real impact while growing your own skills in an international organization.Main AccountabilitiesAs a vital member of our Young Learners team, you will play an essential role in creating a safe, supportive, and inspiring learning environment for children. Your responsibilities will include:Supervising and marshalling students before and after classes to ensure their health and safety, as well as monitoring them during breaks.Assisting teachers in delivering high-quality English lessons by motivating students, helping them complete tasks, demonstrating activities, and modeling language.Providing translation support to facilitate smooth communication when needed.Contributing to the continuous improvement of our courses and services by sharing feedback and participating in team discussions.Supporting customer service by guiding parents and students, addressing queries, and ensuring feedback is passed on promptly.Upholding our safeguarding, health and safety, and equality policies, ensuring every child feels protected and valued.This position offers the opportunity to make a real difference in young learners’ lives while working in a dynamic, collaborative environment.Qualifications & ExperienceEssentials: Experience of working as part of a team.Desirable requirements: Working toward gaining or have a university qualification, preferably related to teaching; experience of working with children.Further InformationLocation: KuwaitContract type/Job type: Hourly PaidHourly rate: KD 3.5Deadline to receive applications: 15th July 2026 (GST 11:59 pm)This role is open for candidates who have the right to live and work and currently based in the work location. Please note that all applications should be submitted in English only. It is advisable to apply in advance to avoid any technical issues at the last moment.

View Details →

Freelance Frontend Developer (Landing Pages & Websites)

Mindrift

KuwaitRemote

Contract

15k-25k USD per month (estimated based on freelance roles in the region and experience) (Estimated)

About Mindrift Mindrift is looking for skilled Frontend Developers (HTML/CSS/JavaScript, React/Next.js) to join the Tendem project (https://tendem.ai/) and build high-quality landing pages and multi-page websites within our hybrid AI + human environment. In this role, as an AI Pilot – that's how we refer to this position at Mindrift – you'll collaborate with Tendem Agents that handle repetitive tasks, while you provide frontend craftsmanship, design sensibility, and quality control to ensure pixel-perfect, conversion-ready websites that match real client briefs. This part-time remote opportunity is ideal for professionals with hands-on experience in landing page production, multi-page website development, and modern frontend frameworks. The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe. About the Role This is a freelance role for a Tendem project. As a Frontend Developer, you'll design, build, and refine landing pages, waitlist pages, promo pages, and small multi-page sites (portfolios, restaurant sites, small business sites, simple online stores), turning client briefs into responsive, performant, polished web experiences. Key Responsibilities Build landing pages, waitlist pages, and promo pages with clear hero sections, CTAs, and conversion-focused layouts. Build small multi-page websites (Home, About, Services, Contact, etc.) that follow client requirements and basic information architecture. Implement responsive, mobile-first layouts that work cleanly across devices and browsers. Integrate forms, email capture, basic analytics, and simple CMS or headless content sources. Apply on-page SEO basics (semantic HTML, meta tags, performance, accessibility) so pages are ready to rank and run ads against. Evaluate AI-generated frontend output and refine markup, styling, structure, and copy placement to production quality. Troubleshoot layout, performance, and cross-browser issues independently. Requirements and Benefits At least 3 years of relevant experience in frontend development, landing page production, or website development (required) Bachelor's or Master's Degree in Computer Science, Engineering, Information Technology, Design, or related fields is a plus Strong foundation in modern frontend development, responsive web design, and building high-quality landing pages and small websites. Ability to create polished, conversion-focused user experiences. Work confidently with modern frontend frameworks and refine AI-assisted output into production-ready interfaces. Strong visual eye, attention to detail, and ability to work independently. Strong command of HTML, CSS, and JavaScript, including responsive design, Flexbox, and Grid. Hands-on experience with at least one modern frontend framework (React, Next.js, Vue, Astro, or similar). Experience with utility-first CSS (Tailwind) or modern component libraries. Ability to translate Figma designs or written briefs into clean, maintainable code. Familiarity with form handling, email capture services, and basic analytics setup (GA4, Plausible, or similar). Working knowledge of on-page SEO, accessibility (WCAG basics), and Core Web Vitals. Comfortable working with simple CMSs or headless content (Webflow, Framer, Sanity, Contentful, or similar) is a plus. Strong attention to detail and visual taste — pages should look intentional, not generic. Self-directed work ethic with the ability to deliver complete pages with minimal supervision. Portfolio of shipped landing pages and/or small websites (required). English proficiency: Upper-intermediate (B2) or above (required). Project Time Expectations For this project, tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements. This is an estimate, not a guaranteed workload.

View Details →