How I Set Up a 6-Agent AI System That Runs 24/7 on a Single VPS
A detailed walkthrough of configuring OpenClaw's multi-agent orchestration platform — from agent design and memory architecture to Telegram integration, voice pipelines, and the operational patterns that emerged from running it continuously.

Most people interact with AI through a single chat window. I wanted something different — a team of specialized agents, each with its own domain, running autonomously on a VPS, accessible through Telegram, and smart enough to remember context across days and weeks.
This post walks through exactly how I set that up using OpenClaw, an open-source multi-agent orchestration platform, on a single Hetzner CPX52 VPS with 24GB RAM. I'll cover the agent architecture, memory design, Telegram integration, voice pipeline, and the operational lessons from running it continuously since early 2025.
Why Multiple Agents Instead of One?
The obvious question: why not just use one powerful agent with a long system prompt?
I tried that first. The problems showed up fast:
- Context pollution — asking about system design would pull in unrelated portfolio data or calendar context. The agent couldn't maintain clean domain boundaries.
- Prompt bloat — cramming instructions for scheduling, code review, learning tracking, and portfolio analysis into one system prompt consumed 40-50% of the context window before the conversation even started.
- Inconsistent persona — one agent trying to be both a stern architecture reviewer and a friendly learning coach resulted in neither being good.
The multi-agent approach solved all three. Each agent gets a focused system prompt, its own memory scope, and a clear domain boundary.
The Agent Lineup
I configured six agents on OpenClaw, each bound to its own workspace directory and Telegram bot:
| Agent | Role | Telegram Bot | What It Does | |-------|------|-------------|--------------| | Astra | Main | @default_bot | General tasks, conversation, daily planning | | Orion | Architect | @architect_bot | System design questions, code architecture review | | Athena | Learning | @athena_bot | DSA prep tracking, Golang/Rust learning progress | | Kairos | Calendar | @kairos_bot | Scheduling, time-blocking, reminders | | Midas | Portfolio | @midas_bot | Stock monitoring, tax notes, financial tracking | | Ops | Operations | headless | Server maintenance, deployment automation |
Each agent runs through a central gateway on localhost:18789, managed as a systemd user service. The gateway handles routing, rate limiting, and health checks.
The Configuration
OpenClaw uses a single JSON config file (openclaw.json) to define all agents. Here's the structure for one agent:
{ "id": "architect", "name": "Orion", "workspace": "/root/.openclaw/workspace-architect", "telegram": { "bot_token_env": "ARCHITECT_BOT_TOKEN", "bot_id": 8468670400 }, "llm": { "provider": "gemini", "model": "gemini-2.0-flash" }, "memory": { "type": "hybrid", "redis_prefix": "orion:", "vector_collection": "orion_memory" } }
The key insight: each agent gets its own workspace directory. This is where its system prompt, memory files, skills, and context live. Complete isolation. Orion can never accidentally read Athena's learning notes, and Midas can't see Ops' deployment logs.
Memory Architecture: The Part That Took the Longest
Getting memory right was the hardest part of the entire setup. The goal: agents should remember what you told them yesterday, last week, or last month — without you having to repeat yourself, and without blowing up the context window.
The Three-Tier Approach
After several iterations, I settled on a three-tier memory system:
Tier 1: Hot Memory (Redis)
- Session state, recent conversation context
- TTL-based expiry (24 hours for conversations, 7 days for preferences)
- Instant retrieval, no embedding cost
- Used for: "What were we just talking about?" / "Continue from where we left off"
Tier 2: Warm Memory (pgvector)
- Semantic search over past interactions and knowledge
- Embeddings generated by
nomic-embed-textrunning locally via Ollama - Zero API cost — the model runs on the VPS itself
- Used for: "What did I decide about the Redis architecture last week?"
Tier 3: Cold Memory (File-based)
.mdfiles injected into the system prompt at session start- Stable facts: user preferences, project context, agent instructions
- Manually curated, rarely changes
- Used for: "Remember that I prefer Go over Python for CLI tools"
Why Hybrid Instead of Pure Vector Search?
I initially tried using pgvector for everything. The problem: vector search retrieves what's semantically similar, not what's contextually relevant.
Example: I asked Orion "How should I design the Redis job queue for async timeout mismatches?" Vector search retrieved a memory about "Redis pub/sub patterns" — semantically close, but not the specific conversation where I'd already decided on BLPOP with callback keys.
The hybrid approach fixes this:
- Redis handles exact lookups — session ID, user preferences, recent decisions
- pgvector handles fuzzy recall — "that conversation about deployment patterns from two weeks ago"
- File injection handles stable context — project facts that don't change
The Impact
Before the hybrid setup, agents would lose context after ~3 conversation turns on complex topics. After:
- Cross-session recall accuracy went from ~30% to ~85% — agents could reference decisions made days ago without re-explanation
- Context window usage dropped by ~35% — by pulling only relevant memories instead of dumping everything, conversations had more room for actual reasoning
- Zero embedding API cost —
nomic-embed-texton Ollama handles all embeddings locally. At ~200 messages/day across all agents, this would cost $15-20/month on OpenAI. Now it's $0.
Telegram Integration: The Interface Layer
I chose Telegram as the primary interface for a few reasons:
- Available on every device (phone, tablet, desktop)
- Push notifications for agent responses
- Native support for voice messages (important for the TTS/STT pipeline)
- Bot API is simple and well-documented
Each agent gets its own Telegram bot. When I message @architect_bot, it goes directly to Orion. When I message @athena_bot, it goes to Athena. No routing confusion, no "which agent should handle this?" ambiguity.
Voice Pipeline
The voice setup was a later addition but turned out to be one of the most useful features:
- STT (Speech-to-Text): Gemini 2.0 Flash — auto-transcribes inbound voice notes from Telegram
- TTS (Text-to-Speech): Edge TTS with
en-US-AriaNeuralvoice - Trigger:
auto: "inbound"— agents only reply with voice when I send a voice message
This means I can have a full conversation with Orion about system design patterns while driving, without touching the keyboard. The transcription quality from Gemini is noticeably better than Whisper for Indian-accented English.
LLM Strategy: Rotating Gemini Profiles
Instead of using a single API key, I configured OpenClaw with 6 rotating Google profiles for Gemini API access. This gives:
- Higher effective rate limits (spread across profiles)
- Redundancy if one key hits quota
- Zero cost — Gemini's free tier is generous enough for personal use
The gateway rotates through profiles round-robin, with automatic fallback if one returns a 429.
Operational Patterns That Emerged
After running this setup for several months, some patterns emerged that I didn't plan for:
1. The Morning Briefing
Astra (main agent) automatically surfaces:
- Unresolved items from yesterday
- Calendar events for today (via Kairos)
- Learning goals for the week (via Athena)
This wasn't a designed feature — it emerged from configuring the right memory triggers and morning prompt templates.
2. Cross-Agent Context Sharing (Manual)
OpenClaw isolates agent memories by design, but sometimes I need to share context. For example, when Orion reviews a system design, the decision might be relevant to Athena's learning plan.
Currently, I do this manually by referencing the decision in a message to Athena. I'm exploring whether a controlled "memory bridge" — where I can explicitly share a specific memory between agents — would be useful without breaking isolation.
3. Restart Resilience
The system runs as a systemd service:
systemctl --user restart openclaw-gateway.service
On restart, each agent:
- Reloads its cold memory (
.mdfiles) - Reconnects to Redis for hot session state
- pgvector is always available (PostgreSQL service)
No conversation is lost. The agent picks up exactly where it left off.
4. Monitoring
Gateway health: curl localhost:18789/health
Agent status: checked via the config's agent registry
Memory usage: Redis INFO memory + PostgreSQL pg_stat_user_tables
I don't have a dashboard yet — it's all CLI checks. For a personal system, this is enough.
What I'd Do Differently
-
Start with fewer agents — I launched with 6 agents on day one. It would have been smarter to start with 2-3 and add agents only when I felt a clear domain boundary forming.
-
Invest in memory curation earlier — the cold memory (
.mdfiles) is the highest-leverage part of the system. Well-curated context files make every agent interaction better. I underinvested in this initially. -
Set up voice from day one — the voice pipeline removed so much friction that I wish I'd configured it immediately instead of treating it as a "nice to have."
The Numbers
- Uptime: 99.7% since February 2025 (two outages, both from VPS maintenance)
- Messages processed: ~200/day across all agents
- LLM cost: $0 (Gemini free tier + local embeddings)
- Server cost: ~$30/month (Hetzner CPX52)
- Memory storage: ~2GB Redis, ~500MB pgvector, ~50MB cold files
- Average response time: 2-4 seconds (Gemini latency)
Bottom Line
Running a multi-agent system isn't about having the latest model or the biggest context window. It's about domain isolation, memory architecture, and operational reliability. OpenClaw gave me the orchestration layer; the work was in designing the agent boundaries, curating the memory, and building the habits to actually use it daily.
The system isn't perfect — calendar integration is still half-built, cross-agent collaboration is manual, and I want better search over historical conversations. But it's running, it's useful every single day, and it's taught me more about production AI systems than any course or tutorial ever could.