Evaluation Scenario Writer - AI Agent Testing Specialist
MindriftJob Description
Mindrift is looking for an Evaluation Scenario Writer to join our team as an AI Agent Testing Specialist. In this role, you'll design realistic and structured evaluation scenarios for LLM-based agents, contributing to the ethical shaping of AI. If you're passionate about AI and possess a strong analytical mindset, this is an excellent opportunity to leverage your skills.
Crafting Effective AI Agent Testing Scenarios
As an Evaluation Scenario Writer, your primary responsibility will be creating test cases that simulate human-performed tasks. You'll define gold-standard behavior, ensuring each scenario is clearly defined, well-scored, and easy to execute and reuse. You will need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions. Learn more about AI Testing.
Key Responsibilities:
- Designing structured test scenarios based on real-world tasks for AI Agent Testing.
- Defining the golden path and acceptable agent behavior.
- Annotating task steps, expected outputs, and edge cases.
- Working with devs to test your scenarios and improve clarity.
- Reviewing agent outputs and adapting tests accordingly.
Ensuring Quality in AI Agent Testing
Your expertise as an Evaluation Scenario Writer will ensure the quality and reliability of AI agents. You'll be responsible for defining the golden path, which includes acceptable agent behavior, and annotating task steps to clarify expected outputs and edge cases. Your efforts will contribute significantly to refining model responses and improving overall AI performance.
Qualifications for the Evaluation Scenario Writer Role
- Bachelor's and/or Master’s Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
- Background in QA, software testing, data analysis, or NLP annotation.
- Good understanding of test design principles (e.g., reproducibility, coverage, edge cases).
- Strong written communication skills in English.
- Comfortable with structured formats like JSON/YAML for scenario description.
- Can define expected agent behaviors (gold paths) and scoring logic.
- Basic experience with Python and JS.
- Curious and open to working with AI-generated content, agent logs, and prompt-based behavior.
- You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines.
Mindrift provides a flexible, remote, freelance project that fits around your primary professional or academic commitments. This position as an Evaluation Scenario Writer, lets you take part in an advanced AI project and gain valuable experience to enhance your portfolio. Influence how future AI models understand and communicate in your field of expertise. More on LLMs.
Check out some example test scenarios.Get notified of similar jobs
We'll send you an email when jobs similar to "Evaluation Scenario Writer - AI Agent Testing Specialist" are posted.
Related Jobs You Might Like
View all jobs →Senior Account Executive - Enterprise Content Services (Public Sector)
Open Text Saudi Arabia LLC
Your Impact As a Senior Account Executive, you will be a trusted strategic partner to Saudi Arabia's public institutions, helping ministries, government agencies, and public enterprises modernise how they manage, secure, and govern their information. You'll lead high-value enterprise sales cycles end-to-end, positioning OpenText Content Cloud as the platform of choice for organisations that are serious about digital transformation, compliance, and the long-term promise of Vision 2030. This isn't a transactional sales role. You'll operate at the intersection of technology, policy, and institutional change, bringing a consultative approach that builds lasting relationships and creates measurable outcomes for some of the Kingdom's most important organisations. What the role offers Ownership of a portfolio of strategic public sector accounts across ministries, government agencies, and large public enterprises with the mandate to grow OpenText's footprint across them. The opportunity to develop and execute a territory plan directly shaped by KSA's Vision 2030 digital transformation priorities Deep, trusted relationships with C-level executives, decision-makers, and procurement leaders across the public sector End-to-end ownership of complex enterprise sales cycles, from identification and qualification through negotiation and close Cross-functional collaboration with presales, solution consulting, and professional services to deliver tailored, high-impact proposals A platform to establish yourself as a thought leader in content management, information governance, and public sector compliance in the region What you need to succeed 10+ years of enterprise B2B sales experience, with at least 5 years in ECM, information governance, or cloud/SaaS solutions A consistent track record of exceeding quota through consultative, solution-based selling to complex, multi-stakeholder accounts Proven experience selling into the Saudi public sector, with established relationships across ministries and large government entities Deep understanding of KSA's digital transformation agenda, data residency requirements, and public sector compliance frameworks Ability to manage complex, multi-stakeholder sales cycles across both direct and channel-driven engagements Excellent communication and presentation skills in both Arabic and English Bachelor's degree required; advanced certifications or executive education in strategy or leadership are a plus
Optical Engineer - Freelance AI Trainer
Mindrift
About Mindrift Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment. What this opportunity involves Design original optics problems that simulate real physics research workflows Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks) Develop problems requiring non-trivial reasoning chains in mechanics, electromagnetism, thermodynamics, and quantum mechanics Base problems on real research challenges or practical applications from optics & physics practice Document problem statements clearly and provide verified correct answers What we look for Degree in Physics (Theoretical, Experimental, or Computational) or related fields 2+ years of professional experience: applied, research, or teaching experience is applicable Experience with numerical simulation methods Ability to design problems that mirror real physics research workflows Creative thinking in problem design across diverse physics areas Familiarity with physics modeling and approximation techniques Strong written English (C1+) How it works Apply Pass qualification(s) Join a project Complete tasks Get paid Project time expectations For this project, tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements. This is an estimate, not a guaranteed workload, and applies only while the project is active. Compensation On this project, contributors can earn up to $35 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.
Research Physicist - Freelance AI Trainer
Mindrift
What this opportunity involves Design original optics problems that simulate real physics research workflows; Ensure problems are computationally intensive and cannot be solved manually within reasonable timeframes (days/weeks); Develop problems requiring non-trivial reasoning chains in mechanics, electromagnetism, thermodynamics, and quantum mechanics; Base problems on real research challenges or practical applications from optics & physics practice; Document problem statements clearly and provide verified correct answers. What we look for Degree in Physics (Theoretical, Experimental, or Computational) or related fields; 2+ years of professional experience: applied, research, or teaching experience is applicable; Experience with numerical simulation methods; Ability to design problems that mirror real physics research workflows; Creative thinking in problem design across diverse physics areas; Familiarity with physics modeling and approximation techniques; Strong written English (C1+). How it works Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid Project time expectations For this project, tasks are estimated to require around 10–20 hours per week during active phases, based on project requirements. This is an estimate, not a guaranteed workload, and applies only while the project is active. Compensation On this project, contributors can earn up to $35 per hour equivalent, depending on their level and pace of contribution. Compensation varies across projects depending on scope, complexity, and required expertise. Please note that other projects on the platform may offer different earning levels based on their requirements.