Latency Metrics
Each conversation log includes three latency measurements:| Metric | Description | What It Measures |
|---|---|---|
average_transcriber_latency | ASR latency | Time to convert speech to text |
average_agent_latency | LLM latency | Time for AI to generate response |
average_synthesizer_latency | TTS latency | Time to convert text to speech |
Getting Latency Data
Performance Benchmarks
| Component | Good | Acceptable | Needs Work |
|---|---|---|---|
| ASR | < 200ms | 200-500ms | > 500ms |
| LLM | < 500ms | 500-1000ms | > 1000ms |
| TTS | < 200ms | 200-500ms | > 500ms |
| Total | < 900ms | 900-2000ms | > 2000ms |
Analyzing Performance
Improving Performance
Reduce LLM latency
Reduce LLM latency
The LLM is usually the slowest component. To speed it up, keep system prompts concise, use streaming responses (enabled by default), and consider faster models like gpt-4o-mini.
Optimize tools
Optimize tools
Tool execution adds latency. Make API calls fast, cache where possible, and use parallel execution with
parallel=True.Monitor trends
Monitor trends
Track latency over time. Sudden increases may indicate API issues or prompt bloat.

