DATA SCIENCE / AI

Chaos Testing for Chatbots: Simulating Customers to Evaluate AI Agents

📅 Jueves 16 de abril 🕐 12:05 - 12:30 (Santiago, GMT-4) 📍 Stream Rojo 🌐 English

Most conversational AI demos look great in single-turn prompts. But real customers don’t behave like prompts: they interrupt, change goals mid-way, provide incomplete information, and ask follow-ups that force the system to stay consistent across multiple steps.

In this session, I’ll share how we built an AI Simulator to evaluate multi-turn conversational systems in a realistic way. Instead of testing a chatbot with isolated prompts, we simulate complete customer journeys, troubleshooting flows, account issues, configuration tasks and automatically measure task completion, correctness, and recovery behavior when the agent makes mistakes.

You’ll learn how multi-turn simulation exposes failure modes that traditional evaluation misses (wrong tool usage, premature answers, policy violations, drift across turns, and “confidently wrong” resolution). We’ll cover the design of customer personas, scenario templates, success criteria, and how to turn simulation results into a production-grade metric suite that enables regression testing and reliable iteration.

If you're building agents, RAG assistants, or support chatbots, this talk will show you how to evaluate them like real systems, not like demos.

data-science-ai ai chatbot evaluation customer-simulation testing

🎟 Save your spot for the talk by Priyan

← Back to

Priyan Pattnayak

Senior Principal Scientist - Oracle Cloud AI

Priyaranjan (Priyan) Pattnayak is a Senior Principal Data Scientist at Oracle Cloud working on agentic and conversational AI systems for enterprise support. He builds evaluation infrastructure for multi-turn conversational experiences, including customer simulation frameworks that measure completion and correctness at scale. His work focuses on making AI assistants reliable in production through structured evaluation, failure attribution, and system-level design. He has published/filed over 30 papers and patents in top tier forums and is an active researcher in the NLP community.

Talks DATA SCIENCE / AI

Tu Repositorio Sabe Más de Tu Equipo Que Tu Equipo Andrea Griffiths

Crea sistemas multiagente con ADK Felipe Velásquez

Basta de solo leer tutoriales y construye algo, aunque sea inútil: Cómo aprender agentes GenAI Lesly Zerna

Construye con datos abiertos de Wikimedia: APIs, SPARQL y visualización Carla Toro Fernández