Bring Your Own Model

For privacy, cost control, or specialized models, you can run LLMs locally or on your own servers. The OpenAIClient works with any endpoint that implements the OpenAI chat completions API.

Complete Example

Here’s a full agent using a local Ollama model:

import os
from smallestai.atoms.agent.nodes import OutputAgentNode
from smallestai.atoms.agent.clients.openai import OpenAIClient
from smallestai.atoms.agent.server import AtomsApp
from smallestai.atoms.agent.session import AgentSession

class LocalAgent(OutputAgentNode):
    def __init__(self):
        super().__init__(name="local-agent")
        
        # Connect to your local model
        self.llm = OpenAIClient(
            model="llama3",
            base_url="http://localhost:11434/v1",
            api_key="ollama"  # Not required for Ollama
        )
        
        self.context.add_message(
            "system",
            "You are a helpful assistant running on local hardware."
        )

    async def generate_response(self):
        response = await self.llm.chat(
            messages=self.context.messages,
            stream=True
        )
        async for chunk in response:
            if chunk.content:
                yield chunk.content

async def on_start(session: AgentSession):
    session.add_node(LocalAgent())
    await session.start()
    await session.wait_until_complete()

if __name__ == "__main__":
    app = AtomsApp(setup_handler=on_start)
    app.run()

Requirements

Your model server must implement the OpenAI Chat Completions API:

Feature	Required	Notes
`/chat/completions` endpoint	Yes	Standard OpenAI format
Streaming	Yes	`stream=True` must work
Tool calling	For tools	OpenAI-format function calling

Custom Endpoints

Connect to any custom model server:

from smallestai.atoms.agent.clients.openai import OpenAIClient

llm = OpenAIClient(
    model="your-model-name",
    base_url="https://your-server.example.com/v1",
    api_key=os.getenv("YOUR_API_KEY")
)

Ollama

Ollama is the easiest way to run models locally. It handles model downloads and serving automatically.

Setup

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model
ollama pull llama3

# Start the server (runs on port 11434)
ollama serve

Usage

from smallestai.atoms.agent.clients.openai import OpenAIClient

llm = OpenAIClient(
    model="llama3",
    base_url="http://localhost:11434/v1",
    api_key="ollama"  # Doesn't require a real key
)

vLLM

vLLM is a high-performance inference server for production workloads.

Setup

pip install vllm

python -m vllm.entrypoints.openai.api_server \
    --model meta-llama/Llama-3-8B-Instruct \
    --port 8000

Usage

from smallestai.atoms.agent.clients.openai import OpenAIClient

llm = OpenAIClient(
    model="meta-llama/Llama-3-8B-Instruct",
    base_url="http://localhost:8000/v1",
    api_key="vllm"
)

LM Studio

LM Studio provides a desktop UI for running models locally.

Download from lmstudio.ai
Load a model
Start the local server (Settings → Local Server)

from smallestai.atoms.agent.clients.openai import OpenAIClient

llm = OpenAIClient(
    model="local-model",
    base_url="http://localhost:1234/v1",
    api_key="lmstudio"
)

Troubleshooting

Issue	Cause	Fix
Connection refused	Server not running	Start Ollama/vLLM
Model not found	Wrong name	Check `ollama list` or server logs
No streaming	Server config	Ensure streaming is enabled
Tool calls ignored	Model limitation	Use a larger model or cloud fallback

Tips

Use local LLMs for development, Cloud/Managed LLM providers for production

Ollama is great for local development. For production, consider vLLM or a cloud provider for reliability.

Set up a fallback LLM

Local models can fail. Configure a cloud fallback (e.g., OpenAI) to catch errors and keep the conversation going.

Check tool calling support

Not all local models support function calling. Test your tools or use a model known to support them.

Introduction

Build

Examples

Complete Example

Requirements

Custom Endpoints

Ollama

Setup

Usage

vLLM

Setup

Usage

LM Studio

Troubleshooting

Tips

Introduction

Build

Examples

​Complete Example

​Requirements

​Custom Endpoints

​Ollama

​Setup

​Usage

​vLLM

​Setup

​Usage

​LM Studio

​Troubleshooting

​Tips

Complete Example

Requirements

Custom Endpoints

Ollama

Setup

Usage

vLLM

Setup

Usage

LM Studio

Troubleshooting

Tips