Skip to main content
Configure the LLM client and response parameters for your agent.

OpenAI Client

The OpenAIClient is the primary way to connect to LLMs:
import os
from smallestai.atoms.agent.clients.openai import OpenAIClient

llm = OpenAIClient(
    model="gpt-4o-mini",
    temperature=0.7,
    api_key=os.getenv("OPENAI_API_KEY")
)

Parameters

ParameterTypeDefaultDescription
modelstrrequiredModel name (e.g., “gpt-4o”, “gpt-4o-mini”)
temperaturefloat0.7Response randomness (0-2)
api_keystrenv varOpenAI API key
base_urlstrOpenAICustom endpoint for BYOM
max_tokensintNoneMax response length

Using the Client

In your agent’s generate_response, call the LLM:
class MyAgent(OutputAgentNode):
    def __init__(self):
        super().__init__(name="my-agent")
        self.llm = OpenAIClient(
            model="gpt-4o-mini",
            temperature=0.7,
            api_key=os.getenv("OPENAI_API_KEY")
        )
        
        self.context.add_message({
            "role": "system",
            "content": "You are a helpful assistant."
        })

    async def generate_response(self):
        response = await self.llm.chat(
            messages=self.context.messages,
            stream=True
        )
        async for chunk in response:
            if chunk.content:
                yield chunk.content

Streaming vs Non-Streaming

For real-time voice agents, always use streaming:
# Streaming (recommended for voice)
response = await self.llm.chat(
    messages=self.context.messages,
    stream=True
)
async for chunk in response:
    yield chunk.content

# Non-streaming (for internal processing)
response = await self.llm.chat(
    messages=self.context.messages,
    stream=False
)
result = response.content

With Tool Calling

Pass tool schemas when using function calling:
async def generate_response(self):
    response = await self.llm.chat(
        messages=self.context.messages,
        stream=True,
        tools=self.tool_schemas  # From ToolRegistry
    )
    
    tool_calls = []
    async for chunk in response:
        if chunk.content:
            yield chunk.content
        if chunk.tool_calls:
            tool_calls.extend(chunk.tool_calls)
    
    if tool_calls:
        # Execute tools and continue conversation
        results = await self.tool_registry.execute(tool_calls)
        # Add results to context and call LLM again

Tips

For most conversational use cases, gpt-4o-mini provides excellent quality at a fraction of the cost.
If your agent uses many tools, lower temperature (0.3-0.5) improves tool selection reliability.
Add a system message to self.context in __init__ to define agent personality.
Voice & STT settings are configured at the platform level when you create or configure your agent in the Atoms dashboard, not in SDK code.