ReActAgent is one of the two programming models in Motus, alongside Workflow. It runs a ReAct loop (“reason, then act”): the agent sends your prompt to the model, runs any tools the model asks for, feeds the results back, and repeats until the model returns a final answer.
Reach for ReActAgent when the problem is open-ended and you want the model to decide what to do next. Research, debugging, coding agents, triage, customer support: anything where you cannot write the plan down in advance.
Minimal example
weather) defined by decorating a plain Python function, and the agent itself wiring them into a loop.
The examples below show only the lines that matter for each feature. Assume they run inside an
async def main() wrapper like the one above.How the loop runs
A single call toawait agent(...) runs this loop until the model has nothing more to ask for.
The model gets called
The agent sends the full conversation and the tool schemas to the model and waits for a completion.
The assistant message lands in memory
The completion becomes an assistant message in the history, whether it contains tool calls, a final answer, or both.
If there are tool calls, run them
The agent calls each tool. Because every tool call goes through the runtime as a task, independent tool calls execute concurrently rather than one at a time. Each result is appended to memory as a tool message, keyed to its
tool_call_id.max_steps or timeout is exceeded.
Multi-turn conversations
ReActAgent is stateful by default. Each call to the agent appends the user message and the assistant reply to its memory, so the next call sees the full history.
memory_type="basic" keeps every message in order. For long conversations that could run into tens of thousands of tokens, switch to memory_type="compact", which summarizes older turns once the token count crosses a threshold so the context window never overflows. See Memory for the full picture.
Resetting and forking
Callagent.reset() to clear the conversation history and start fresh with the same configuration.
agent.fork() returns an independent copy of the agent with the same configuration and a forked copy of the current conversation. Changes to the fork do not affect the original, so you can branch off a checkpoint and explore alternatives.
Structured output
Pass a Pydantic model asresponse_format and the agent returns a parsed instance instead of a string. Motus uses the provider’s strict structured-output mode to guarantee the JSON matches your schema.
Limits
Usemax_steps to cap the number of reasoning-and-acting cycles. Use timeout to set a wall-clock deadline in seconds.
Usage and cost
After any call to the agent, you can read token usage, estimated cost, and context window usage directly off the agent object. The counts accumulate across every call in the agent’s lifetime, not just the most recent one.threshold in context_window_usage is whatever will trigger memory compaction (from CompactionMemory, or a default derived from the model’s context window if basic memory is used).
| Attribute | What you get |
|---|---|
agent.usage | Accumulated token counts across every LLM call |
agent.cost | Total cost in USD, or None if the model has no pricing entry |
agent.context_window_usage | Current working-memory size relative to the compaction threshold |
agent.get_execution_trace() | The memory trace as a dict, enriched with usage, model, and cost |
Streaming intermediate state
Pass astep_callback to observe the agent in real time. The callback fires after every LLM step that has tool calls, before those tools run.
motus serve streams intermediate state to connected clients. It does not fire on the final step (the one without tool calls); the caller receives the final answer as the return value.
Using an agent as a tool
agent.as_tool() wraps an agent so another agent can call it as a regular tool. The caller never knows it’s talking to another agent.
as_tool() invocation starts the inner agent with a fresh conversation. Pass stateful=True to preserve the inner agent’s memory across calls within the same parent run. Other options include overriding the inner agent’s name, description, max_steps, and per-call guardrails.
You can also pass an agent directly in
tools=[...] without calling as_tool(). Motus wraps it automatically with default settings. See Multi-agent for the full composition guide, including output extractors and other advanced options.Guardrails
Attach validation functions that run before the agent starts or after it returns. Input guardrails see the user prompt; output guardrails see the final result. A guardrail can do three things: returnNone to let the value through unchanged, return a replacement value, or raise to block the run entirely.
response_format is set, output guardrails can declare individual Pydantic fields and rewrite them. See Guardrails for the full API, including tool-level guardrails.
Reasoning
Thereasoning parameter controls extended thinking on models that support it. The default is ReasoningConfig.auto(), which enables adaptive thinking on Opus 4.6 and Sonnet 4.6.
effort="low" | "medium" | "high" | "max". Non-adaptive models use budget_tokens to set an explicit thinking budget. ReasoningConfig.disabled() turns thinking off on any model.
Prompt caching
On Anthropic models, the agent places cache breakpoints on the repeating part of your prompt so that system prompt, tool definitions, and prior conversation turns are read from cache instead of billed as fresh input on every call.cache_policy controls how aggressive this is.
| Policy | Cache breakpoints | TTL |
|---|---|---|
"none" | None | n/a |
"static" | System prompt and tool definitions | 5 minutes |
"auto" (default) | Static plus the end of the previous conversation turn | 5 minutes |
"auto_1h" | Same as "auto" | 1 hour |
"auto", Motus tags the second-to-last user or tool-result message with a cache breakpoint on every call. The net effect is that on step N+1, the entire prompt prefix up to and including turn N is a cache read, and only the latest turn is fresh tokens. "auto_1h" is the same strategy with a longer TTL, useful for long-lived agents where the prefix is reused over timescales greater than five minutes.
Constructor reference
| Parameter | Type | Default | Purpose |
|---|---|---|---|
client | BaseChatClient | required | LLM provider client |
model_name | str | required | Model identifier (e.g. "gpt-4o", "claude-opus-4-6") |
name | str | None | auto-inferred | Agent name, used in tracing and tool registration |
system_prompt | str | None | None | System prompt prepended to every LLM call |
tools | list, dict, callable, or Tools | None | Tools available to the agent |
response_format | type[BaseModel] | None | None | Structured output via a Pydantic model |
max_steps | int | 20 | Max loop cycles before the agent raises RuntimeError |
timeout | float | None | None | Wall-clock deadline in seconds; raises TimeoutError |
memory_type | "basic" | "compact" | "basic" | Memory strategy, ignored if memory is passed |
memory | BaseMemory | None | None | Custom memory instance, overrides memory_type |
input_guardrails | list[Callable] | [] | Hooks on the user prompt before the agent runs |
output_guardrails | list[Callable] | [] | Hooks on the final result |
reasoning | ReasoningConfig | ReasoningConfig.auto() | Extended thinking configuration |
cache_policy | CachePolicy | str | "auto" | Prompt caching strategy (Anthropic only) |
step_callback | Callable | None | None | Async callback fired after each LLM step with tool calls |
name is not passed, Motus infers it from the variable you assigned the agent to on first call, falling back to the class name.
