OpenAIClient is the unified interface for calling LLMs. It works with OpenAI by default, but supports any OpenAI-compatible endpoint by changing the base_url.
Basic Configuration
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | — | The model identifier (e.g., gpt-4o-mini, llama-3.1-70b-versatile) |
api_key | str | OPENAI_API_KEY env | Your provider’s API key |
base_url | str | OpenAI | Custom endpoint URL for other providers |
temperature | float | 0.7 | Controls randomness. Lower = consistent, higher = creative |
max_tokens | int | 1024 | Maximum tokens in the response |
Temperature
Controls how “creative” vs “predictable” the model behaves:- 0.0–0.3: Consistent, factual. Best for support, FAQ bots.
- 0.4–0.6: Balanced. Good for general conversation.
- 0.7–1.0: Creative, varied. Better for sales, engagement.
Using Other Providers
Any provider with an OpenAI-compatible API works by settingbase_url. This includes Groq, Together.ai, Fireworks, Anyscale, OpenRouter, and Azure OpenAI.
base_url and api_key—your agent code stays the same.
Streaming
Voice agents must use streaming. Without it, users wait for the entire response before hearing anything.Tool Calling
To enable function calling, pass tool schemas:Error Handling
Voice Configuration
Agents also need a voice for text-to-speech. Waves is our recommended TTS engine—ultra-low latency, optimized for real-time telephony.Basic Voice Setup
Waves Voice IDs
| Voice ID | Description |
|---|---|
zorin | Male, professional (recommended) |
emily | Female, warm |
raj | Male, Indian English |
aria | Female, neutral |
Voice Cloning
Create custom voices from audio samples: → Waves Voice Cloning GuideThird-Party Providers
OpenAI and ElevenLabs voices are also supported. Setprovider to openai or elevenlabs and use their respective voice IDs.
Tips
Keep max_tokens low for voice
Keep max_tokens low for voice
Set
max_tokens to 100-200. Shorter responses mean faster audio playback. Guide conciseness in your prompt too.Warm up connections on start
Warm up connections on start
The first LLM call has connection overhead. Send a tiny request in
start() to warm up before the user speaks.Use fallbacks for reliability
Use fallbacks for reliability
Configure a backup provider (e.g., Groq primary, OpenAI fallback) to handle rate limits or outages gracefully.

