Memory manages the conversation context that gets sent to the LLM on each call. Without memory management, context grows unbounded until it exceeds the model’s context window and causes an API error. Motus handles this automatically with two built-in strategies.Documentation Index
Fetch the complete documentation index at: https://docs.motus.lithosai.com/llms.txt
Use this file to discover all available pages before exploring further.
Memory types
| Strategy | Token management | Persistence | Use case |
|---|---|---|---|
basic | None (grows unbounded) | In-memory only | Short conversations, testing |
compact | Auto-compacts at threshold | Optional log-based restore | Production agents, long sessions |
background | Auto-compacts + agent-managed memory | Cross-session persistence | Coming soon |
BaseMemory and share an async interface: add_message(), compact(), get_context(), and get_memory_trace().
Architecture
CompactionBase provides the core compaction logic shared by all compacting memory types: turn boundary detection, token threshold management, and LLM-based summarization. CompactionMemory adds conversation log persistence and session restore on top.
BasicMemory
BasicMemory is the default. Messages accumulate until the conversation ends. If the context window overflows, the model provider returns an API error.
memory_type or memory argument.
CompactionMemory
CompactionMemory monitors token count after every message. When the estimated token count exceeds a threshold and the conversation is at a turn boundary, it summarizes older turns into a continuation message. The agent loop continues without interruption.
Configuring CompactionMemory
For full control, instantiateCompactionMemory directly and pass it via the memory parameter:
CompactionMemoryConfig fields
| Field | Default | Description |
|---|---|---|
compact_model_name | Agent’s model | Model used for the compaction LLM call |
token_threshold | None | Explicit token threshold. When None, derived from the model’s context window times safety_ratio |
safety_ratio | 0.75 | Fraction of the context window that triggers compaction |
session_id | Auto UUID | Identifier for the conversation session |
log_base_path | None | Directory for JSONL conversation logs. None disables logging |
max_tool_result_tokens | 50000 | Maximum tokens per tool result before truncation |
Compaction only triggers at clean turn boundaries to avoid corrupting in-progress tool call sequences. A ReAct agent loop produces three types of turn units:
- Unit A:
[user message] - Unit B:
[assistant + tool_calls]followed by[tool_result x N] - Unit C:
[assistant, no tool calls](final response)
_pending_tool_calls, a counter incremented when the assistant issues tool calls and decremented as each result arrives. Compaction fires only when the counter reaches zero.Session save and restore
When you setlog_base_path, CompactionMemory writes every message and compaction event to a JSONL file. You can restore a previous session from this log:
restore_from_log replays all log entries (messages and compaction events) to rebuild the in-memory state. The restored instance appends to the same session log.
For programmatic session persistence without log files, use CompactionSessionState:
CompactionSessionState captures the current context window (messages + system prompt) along with session identity and log store location for cross-session continuity.
Custom compaction function
Replace the default LLM-based compaction with your own summarization logic:Custom memory
SubclassBaseMemory and implement compact() and reset() to build your own strategy:
CompactionBase instead. It provides boundary-aware auto-compaction, set_model(), and the default LLM summarization logic.
BackgroundMemory (coming soon)
A long-term memory solution that works both locally and on the cloud is under active development.BackgroundMemory will extend CompactionBase with agent-managed cross-session memory, allowing the main agent to remember facts, preferences, and context across conversations without distraction.
