Skip to main content
Motus provides a unified BaseChatClient interface across four LLM providers. Switch providers by changing one import — your agent code stays the same.
import asyncio
from motus.agent import ReActAgent
from motus.models import OpenAIChatClient

client = OpenAIChatClient(api_key="sk-...")
agent = ReActAgent(client=client, model_name="gpt-4o")

async def main():
    print(await agent("Hello!"))

asyncio.run(main())

Supported providers

ClassProviderPackage
OpenAIChatClientOpenAI, local models (Ollama, vLLM)openai
AnthropicChatClientAnthropic Claudeanthropic
GeminiChatClientGoogle Geminigoogle-genai
OpenRouterChatClientOpenRouter (multi-provider)openai (OpenAI-compatible)

Creating a client

from motus.models import OpenAIChatClient

client = OpenAIChatClient(api_key="sk-...")
All clients accept **kwargs forwarded to the underlying SDK constructor. You can pass timeout, max_retries, default_headers, or any other SDK-level option.

Local models

OpenAIChatClient works with any OpenAI-compatible API. Point base_url at your local server:
from motus.models import OpenAIChatClient
from motus.agent import ReActAgent

# Ollama
client = OpenAIChatClient(base_url="http://localhost:11434/v1")

# vLLM
client = OpenAIChatClient(base_url="http://localhost:8000/v1")

agent = ReActAgent(client=client, model_name="llama3.1")
No API key is required when the server does not enforce authentication.
GeminiChatClient also supports Vertex AI. Pass vertexai=True along with your project and location:
client = GeminiChatClient(vertexai=True, project="my-project", location="us-central1")

BaseChatClient interface

Every client implements two async methods. You do not call these directly when using ReActAgent — the agent manages the call loop. They are relevant if you build a custom agent or need raw completions.
MethodPurpose
create(model, messages, tools, ...)Standard chat completion. Returns ChatCompletion.
parse(model, messages, response_format, ...)Structured output. Parses the response into a Pydantic model.
import asyncio
from motus.models import OpenAIChatClient
from motus.models.base import ChatMessage

client = OpenAIChatClient(api_key="sk-...")

async def main():
    completion = await client.create(
        model="gpt-4o",
        messages=[ChatMessage.user_message("What is 2 + 2?")],
    )
    print(completion.content)

asyncio.run(main())

ChatMessage

ChatMessage is the unified message format used across all providers. Use the factory methods to create messages for each role:
from motus.models.base import ChatMessage

system  = ChatMessage.system_message("You are a helpful assistant.")
user    = ChatMessage.user_message("Hello!")
assist  = ChatMessage.assistant_message("Hi there!")
tool    = ChatMessage.tool_message(content="result", tool_call_id="call_123", name="my_tool")
MethodRoleRequired args
system_message(content)systemcontent
user_message(content, base64_image=None)usercontent
assistant_message(content, tool_calls=None)assistantcontent
tool_message(content, tool_call_id, name)toolcontent, tool_call_id, name
user_message and assistant_message accept an optional base64_image parameter for vision inputs.

Prompt caching

Anthropic supports prompt caching to reduce latency and cost on repeated calls. Control this with CachePolicy:
PolicyBehavior
NONENo caching. Every request sends the full prompt.
STATICCache the system prompt and tool definitions. After the first call, these are read from cache.
AUTOSTATIC + cache the conversation turn prefix. Prior turns are read from cache each step.
Set the policy on the agent:
from motus.agent import ReActAgent
from motus.models import AnthropicChatClient

client = AnthropicChatClient(api_key="sk-ant-...")
agent = ReActAgent(
    client=client,
    model_name="claude-sonnet-4-20250514",
    cache_policy="auto",
)
AUTO is the default. For agents with large system prompts or many tools, caching significantly reduces per-step token costs.
cache_policy has no effect on providers that do not support prompt caching (OpenAI, Gemini, OpenRouter).

ChatCompletion

client.create() returns a ChatCompletion object with these fields:
FieldTypeDescription
contentstr | NoneText response
tool_callslist[ToolCall] | NoneTool calls requested by the model
reasoningstr | NoneChain-of-thought reasoning (if supported)
finish_reasonstr"stop", "tool_calls", or "length"
usagedictToken usage (prompt_tokens, completion_tokens)
parsedAny | NoneParsed Pydantic object (from parse())
Call completion.to_message() to convert a completion into a ChatMessage for appending to conversation history.