OpenAIClient works with any endpoint that implements the OpenAI chat completions API.
Complete Example
Here’s a full agent using a local Ollama model:Requirements
Your model server must implement the OpenAI Chat Completions API:| Feature | Required | Notes |
|---|---|---|
/chat/completions endpoint | Yes | Standard OpenAI format |
| Streaming | Yes | stream=True must work |
| Tool calling | For tools | OpenAI-format function calling |
Custom Endpoints
Connect to any custom model server:Ollama
Ollama is the easiest way to run models locally. It handles model downloads and serving automatically.Setup
Usage
vLLM
vLLM is a high-performance inference server for production workloads.Setup
Usage
LM Studio
LM Studio provides a desktop UI for running models locally.- Download from lmstudio.ai
- Load a model
- Start the local server (Settings → Local Server)
Troubleshooting
| Issue | Cause | Fix |
|---|---|---|
| Connection refused | Server not running | Start Ollama/vLLM |
| Model not found | Wrong name | Check ollama list or server logs |
| No streaming | Server config | Ensure streaming is enabled |
| Tool calls ignored | Model limitation | Use a larger model or cloud fallback |
Tips
Use local LLMs for development, Cloud/Managed LLM providers for production
Use local LLMs for development, Cloud/Managed LLM providers for production
Ollama is great for local development. For production, consider vLLM or a cloud provider for reliability.
Set up a fallback LLM
Set up a fallback LLM
Local models can fail. Configure a cloud fallback (e.g., OpenAI) to catch errors and keep the conversation going.
Check tool calling support
Check tool calling support
Not all local models support function calling. Test your tools or use a model known to support them.

