How OpenClaw's Multi-Agent Architecture Actually Works Under the Hood
I have all the real details I need. Writing the article now.
SUBJECT: How OpenClaw's multi-agent system works PREVIEW: Six agents, one gateway, zero vendor lock-in. The architecture behind the system. TITLE: How OpenClaw's Multi-Agent Architecture Actually Works Under the Hood
Most multi-agent AI frameworks are just API wrappers with a routing table bolted on.
They give every "agent" the same system prompt, the same memory, the same tool access. Then they call it a team. You end up with six instances of the same LLM answering six versions of the same question, with no real specialization, no persistent state, and no awareness of each other.
OpenClaw does something different. Each agent has its own identity, its own workspace, its own memory namespace, and its own Telegram bot. They share exactly one thing: a gateway.
Here's how it's actually built.
The Gateway: One Port, Six Agents
The entire system runs behind a single local gateway at localhost:18789 — a systemd user service (openclaw-gateway.service) that acts as the traffic controller.
"gateway": { "port": 18789, "mode": "local", "bind": "loopback", "auth": { "mode": "token", "token": "5c4850d38..." } }
The gateway does three things:
- Receives inbound messages from channels (Telegram, cron, webhooks)
- Routes them to the right agent based on binding rules
- Enforces security — a
denyCommandslist blocks camera capture, calendar writes, and contact creation from being triggered remotely
The bind: "loopback" means nothing leaves the machine without going through the agent first. No public endpoints, no ngrok tunnels. The gateway is intentionally offline-first.
What makes this non-trivial is the binding layer.
The Binding Layer: How a Telegram Message Finds Its Agent
Six agents. Five Telegram bots. One user. How does a message from @kairos_astra_bot end up in the calendar agent's context and not the main agent's?
Bindings.
"bindings": [ { "agentId": "main", "match": { "channel": "telegram", "accountId": "default" } }, { "agentId": "architect", "match": { "channel": "telegram", "accountId": "architect" } }, { "agentId": "learning", "match": { "channel": "telegram", "accountId": "athena" } }, { "agentId": "portfolio", "match": { "channel": "telegram", "accountId": "midas" } }, { "agentId": "calendar", "match": { "channel": "telegram", "accountId": "kairos" } } ]
Each Telegram account (accountId) is a separate bot token. The gateway matches incoming messages by which bot they arrived at, then routes to the corresponding agent. Send a message to Astra's bot — main agent handles it. Send to @kairos_astra_bot — calendar agent.
This is not just routing. It's namespace isolation at the channel level. The main agent (Astra) never sees messages intended for Athena or Kairos unless explicitly handed off.
The access policy adds another layer:
"dmPolicy": "pairing", "groupPolicy": "allowlist"
dmPolicy: "pairing" means the bot only responds to users who have explicitly paired with it — no open DMs. groupPolicy: "allowlist" locks group chat access to approved chat IDs. The system is not chatty to strangers.
[IMAGE: diagram showing Telegram bots → gateway → agent routing with accountId mapping]
The Identity Layer: Why Each Agent Has a Soul
This is the part most multi-agent frameworks skip entirely.
Every agent in OpenClaw has a personality document — SOUL.md — that lives in its workspace. For the main agent (Astra), this is ~170 lines defining not just behavior but character:
"You are the chief of staff. Not an assistant — the person who holds the whole picture together when Ankit doesn't have time to."
This isn't window dressing. The SOUL.md contains:
- Operating modes: Chief of Staff / Strategic Advisor / Coordinator / Execution / Audit / Reflection
- Tone calibration table: "flat" vs "alive" response pairs for 8 different scenarios
- Proactivity rules: when to surface information unprompted vs. when to stay quiet
- Guardrails: external actions need sign-off, no noise-only messages, no dopamine optimization
The SUBAGENTS.md defines the team structure and routing rules — which questions route to Orion (architecture), Athena (DSA), Kairos (calendar), or Midas (portfolio), and critically, what stays with the main agent and doesn't get routed at all.
What makes this compound over time: the SOUL.md is behavioral spec, not memory. It doesn't change every session. It creates a consistent character that users can depend on across thousands of turns.
The workspace structure for each agent follows the same pattern:
workspace-{agent}/
SOUL.md ← character and operating rules
IDENTITY.md ← name, vibe, emoji
AGENTS.md ← session init protocol
SUBAGENTS.md ← team routing rules (main agent only)
HEARTBEAT.md ← how to behave on scheduled pings
TOOLS.md ← environment-specific tool notes
memory/
CORE/ ← stable, rarely changes
ACTIVE/ ← current project state
LOG/ ← daily append-only journals
ARCHIVE/ ← monthly summaries after 30-day TTL
Each agent is a complete operating context. Not a function. Not a microservice. An agent that knows who it is, what it's for, and how it should behave even in edge cases.
The LLM Layer: Six Profiles, Automatic Fallback
OpenClaw doesn't commit to one provider. The model config at the agent defaults level:
"model": { "primary": "google/gemini-2.5-flash", "fallbacks": [ "google/gemini-2.5-pro", "openrouter/meta-llama/llama-3.3-70b-instruct:free", "openrouter/arcee-ai/trinity-large-preview:free", "openrouter/meta-llama/llama-3.2-3b-instruct:free" ] }
The auth layer has 6 rotating Google profiles (google:main, google:project1 ... google:project5), each with its own API key. When one profile hits rate limits, the gateway rotates to the next. The fallback chain continues through OpenRouter's free tier if needed.
This is specifically why the Anthropic ban story from yesterday matters to builders here: the entire architecture is designed so no single provider can take down your agent fleet. You can swap the primary model without touching agent behavior. The SOUL.md doesn't care whether it's running on Gemini 2.5 Pro or Llama 3.3. The character spec is provider-agnostic.
The compaction setting is worth noting:
"compaction": { "reserveTokensFloor": 40000 }
The gateway reserves 40K tokens at the floor before triggering context compaction. Dense technical agents running long sessions don't silently lose context mid-task. The floor is explicit.
The Memory Architecture: Three Tiers, One Truth
This is where most DIY multi-agent setups fall apart. Memory is hard. Not conceptually — architecturally. What do you persist? Where? Who can read it?
OpenClaw runs three memory tiers in parallel.
Tier 1: Structured Markdown (memory/CORE/, ACTIVE/, LOG/)
Session-readable documents organized by volatility:
CORE/— stable facts: user profile, tool statuses, settled decisions. Changes rarely.ACTIVE/— current project state, cron health, ongoing work. Updated weekly.LOG/— append-only daily journals with 30-day TTL, then auto-archived to monthly summaries.
Tier 2: Ruflo Vector Store (Semantic Memory)
The real-time memory layer. Each agent queries ruflo at session start:
/root/.openclaw/scripts/agent-mem.sh search shared "user preferences decisions project context" 5 /root/.openclaw/scripts/agent-mem.sh search main "recent decisions findings context" 5
Namespaces matter here:
shared— written by any agent, readable by all. User corrections, tool statuses, cross-agent decisions.main,learning,architect, etc. — agent-specific memories. Ops can't read Athena's learning logs.
Tier 3: Automated Harvesting (The Ops Layer)
This is the part nobody else builds. harvest-sessions.py runs from the Ops agent heartbeat. It scans every agent's JSONL session files automatically — no agent cooperation needed.
Three pattern-matching passes per session:
CORRECTION_PATTERNS = [ re.compile(r'^(no|nope|wait|stop|actually|not that|don\'t)', re.I), re.compile(r'\balways\b', re.I), re.compile(r'\bnever\b', re.I), re.compile(r'\bfrom now on\b', re.I), re.compile(r'\bremember\b', re.I), ]
Pass 1: short user messages after long assistant turns → user corrections
Pass 2: assistant messages containing fixed, decided, root cause → agent decisions
Pass 3: project/idea mentions in either role → project context
Each extracted memory gets stored in ruflo with structured keys:
agent:main:correction:{md5hash} # tags: correction,preference,main,auto-harvested
agent:main:session:{md5hash} # tags: decision,main,auto-harvested
claude:project:{md5hash} # tags: project,idea,main,auto-harvested
Cap is 15 memories per session to avoid noise. State is tracked by file mtime — sessions are only re-harvested if they've been modified.
The result: every correction you give any agent gets automatically extracted into the vector store and becomes available for the next session, without any agent having to explicitly remember it.
[IMAGE: diagram showing session JSONL → harvest-sessions.py → ruflo namespaces → agent session init]
The Cron Engine: Agents That Work While You Sleep
The cron system at /root/.openclaw/cron/jobs.json is more interesting than it looks. Each job is not a shell command — it's an agent turn.
{ "id": "800eedc0-...", "agentId": "calendar", "name": "Calendar Morning Briefing", "schedule": { "kind": "cron", "expr": "30 2 * * *", "tz": "UTC" }, "sessionTarget": "isolated", "wakeMode": "now", "payload": { "kind": "agentTurn", "message": "python3 /root/.openclaw/workspace-calendar/scripts/kairos.py briefing" }, "delivery": { "to": "5132880317", "mode": "announce", "channel": "telegram" } }
sessionTarget: "isolated" means the cron job gets its own fresh context — it doesn't bleed into the agent's conversational history. The agent wakes up, runs the task, delivers the result to Telegram chat ID 5132880317, and goes back to sleep.
The delivery system is channel-aware. mode: "announce" means the gateway pushes the result as a new message, not a reply to an existing conversation. The calendar briefing arrives at 8:00 AM IST (02:30 UTC) every day regardless of whether there's an active conversation.
State tracking on each job:
"state": { "nextRunAtMs": 1775442600000, "lastRunAtMs": 1775356200004, "lastStatus": "ok", "lastDurationMs": 11730, "consecutiveErrors": 0 }
lastDurationMs: 11730 — the calendar briefing took 11.7 seconds last run. consecutiveErrors: 0 — no failures. The gateway tracks this per job and can back off or alert on error accumulation.
The Heartbeat Pattern: Pre-Screening Before the LLM
Every agent runs on a heartbeat — every 60 minutes by default. A naive implementation would just wake the LLM on every ping. That's expensive and pointless if nothing has changed.
OpenClaw uses heartbeat-guard.js — a Node.js pre-screener that runs rule-based checks before the LLM is invoked:
// Check LOG file retention (30-day TTL) const expired = files.filter(f => stat.mtimeMs < cutoffMs); if (expired.length > 0) { signals.push(`ARCHIVE: ${expired.length} LOG file(s) older than 30 days.`); } // Check known-broken tools for recovery const knownBroken = { "agent-browser": { ... } } // Exit with HEARTBEAT_OK if nothing needs attention if (signals.length === 0) process.exit(0); // LLM never invoked
If nothing needs attention, process.exit(0) with HEARTBEAT_OK. The LLM is never invoked. At 60-minute intervals across 6 agents, this gate saves substantial compute and API cost over weeks of continuous operation.
The HEARTBEAT.md for each agent adds behavioral rules on top — quiet hours (no outreach 23:00–08:00 IST), the daily memory write protocol, and the quality standard for what's worth writing at all:
"Do NOT write: 'Heartbeat check. Replied HEARTBEAT_OK.' Write only if it changes behavior next session."
This keeps the memory layer signal-dense. Empty pings don't generate empty memories.
Where This Falls Short
The architecture has real rough edges.
Kairos is half-broken. Google Calendar OAuth2 is not set up. The morning briefing works because kairos.py uses CalDAV directly, but the full calendar write path is a stub. The gateway logs show repeated [tools] message failed: Unknown target "Ankit" for Telegram errors from calendar sessions that tried to send to a name instead of a chat ID — the agent forgot the correct target.
Midas is blocked entirely. The portfolio agent can't do its job because the Notion database it reads from was never created. The agent exists, the bot exists, the bindings work — but the data layer is missing. An agent team where one agent is permanently blocked on external dependencies is a team that creates false confidence.
The session harvester is noisy at scale. The pattern-matching approach that extracts "corrections" from session JSONL is heuristic, not semantic. A message like "stop that, it's raining" would match the correction patterns. With 59 sessions in the main agent directory and growing, false positives in the vector store compound over time.
The ruflo query at session start is not ordered by recency. Vector similarity is not the same as temporal relevance. A user correction from 3 months ago can rank higher than a decision made yesterday if the embedding similarity is stronger. The search needs a recency weight that it doesn't currently have.
The Actual Architecture in One Sentence
OpenClaw is a routing gateway with isolated agent contexts, provider-agnostic LLM fallback, and a memory layer that harvests corrections automatically — so the system improves over time without agents explicitly remembering anything.
What to do Monday: If you're building multi-agent systems and haven't separated your agents' memory namespaces yet — do that first. Shared memory between agents is the primary reason multi-agent systems become incoherent at scale. Namespace isolation is not a performance optimization. It's a correctness requirement.
Coming next: The content pipeline that runs on top of this infrastructure — how a Notion DB row becomes a Substack article, a LinkedIn post, an Instagram carousel, and a Telegram notification, with no human in the loop except for approval.
If this was useful, paid subscribers get this depth 2–3x per week — build logs, architecture teardowns, and behind-the-build posts on real systems.
[Subscribe — $8/month or $80/year (save 2 months)]