Eval Packs
Replayable benchmark scenes with gold behavior labels.
Real-world audio arena
A controlled benchmark and training environment for full-duplex voice agents operating in subways, cars, offices, cafes, streets, and homes.
Product
AudioWorld packages real-world audio scenes where voice agents must listen, track state, reject distractors, use tools, stop for barge-in, and answer safely under realistic acoustic stress.
Replayable benchmark scenes with gold behavior labels.
Audio-to-action items for full-duplex SLM improvement.
Agent behavior breakdowns tied to timestamps and tasks.
Real backgrounds with consented or safe synthetic speech.
Each scene includes audio, turn timing, expected agent state, ignored events, tool-call targets, latency budgets, and safety constraints.
Agent logs become labeled items for ASR under noise, turn boundaries, intent slots, state updates, tool decisions, barge-in policy, distractor rejection, and response policy.
Subway, car vibration, office, cafe, street, and home audio.
Speech target and residual background stems with lineage.
Timed multilingual turns over real acoustic backgrounds.
Replay into voice agents and score behavior, not just words.
AudioWorld is preparing pilot packs for voice-agent teams that need hard, realistic, repeatable audio interaction tests.
Arena model
AudioWorld turns real recordings into timed interaction scenes with expected state, ignored audio events, tool-use targets, and latency budgets. The same scene can evaluate OpenAI, Gemini, local SLMs, or a customer's internal stack.
World coverage
Announcements, crowd speech, tunnel rumble, station names.
Road noise, vibration, hands-free microphones, safety constraints.
Keyboards, room echo, side conversations, low-volume turns.
Music, dishes, overlapping speakers, privacy-sensitive audio.
Wind, sirens, traffic spikes, unstable turn boundaries.
TV, family voices, appliances, command-vs-distractor decisions.