Why Streaming Matters
| Approach | Time to First Audio | User Experience |
|---|---|---|
| Non-streaming | 800–1500ms | Awkward silence, then full response |
| Streaming | 200–400ms | Natural conversation flow |
Basic Streaming
Setstream=True and yield each chunk.content for instant TTS playback.
Streaming with Tools
Collect tool calls while streaming, execute them, then stream the follow-up response.Chunking Strategies
Word-by-Word (Default)
LLMs typically stream tokens, which map roughly to words or word fragments:Sentence Buffering
Buffer complete sentences for more natural speech boundaries:Phrase Buffering
Buffer by phrase for smoother speech rhythm:Intermediate Feedback
Provide feedback while processing long operations:Streaming Best Practices
Do's and Don'ts
Do's and Don'ts
Do: Always set
stream=True for LLM calls. Yield chunks as soon as they’re available. Provide intermediate feedback during long operations. Keep responses concise since shorter means faster to speak.Don’t: Buffer the entire response before yielding. Never leave users in silence for more than 2 seconds. Avoid yielding empty strings or whitespace-only chunks.
