How I Set Up a 6-Agent AI System That Runs 24/7 on a Single VPS

A detailed walkthrough of configuring OpenClaw's multi-agent orchestration platform — from agent design and memory architecture to Telegram integration, voice pipelines, and the operational patterns that emerged from running it continuously.

Most people interact with AI through a single chat window. I wanted something different — a team of specialized agents, each with its own domain, running autonomously on a VPS, accessible through Telegram, and smart enough to remember context across days and weeks.

This post walks through exactly how I set that up using OpenClaw, an open-source multi-agent orchestration platform, on a single Hetzner CPX52 VPS with 24GB RAM. I'll cover the agent architecture, memory design, Telegram integration, voice pipeline, and the operational lessons from running it continuously since early 2025.

Why Multiple Agents Instead of One?

The obvious question: why not just use one powerful agent with a long system prompt?

I tried that first. The problems showed up fast:

Context pollution — asking about system design would pull in unrelated portfolio data or calendar context. The agent couldn't maintain clean domain boundaries.
Prompt bloat — cramming instructions for scheduling, code review, learning tracking, and portfolio analysis into one system prompt consumed 40-50% of the context window before the conversation even started.
Inconsistent persona — one agent trying to be both a stern architecture reviewer and a friendly learning coach resulted in neither being good.

The multi-agent approach solved all three. Each agent gets a focused system prompt, its own memory scope, and a clear domain boundary.

The Agent Lineup

I configured six agents on OpenClaw, each bound to its own workspace directory and Telegram bot:

| Agent | Role | Telegram Bot | What It Does | |-------|------|-------------|--------------| | Astra | Main | @default_bot | General tasks, conversation, daily planning | | Orion | Architect | @architect_bot | System design questions, code architecture review | | Athena | Learning | @athena_bot | DSA prep tracking, Golang/Rust learning progress | | Kairos | Calendar | @kairos_bot | Scheduling, time-blocking, reminders | | Midas | Portfolio | @midas_bot | Stock monitoring, tax notes, financial tracking | | Ops | Operations | headless | Server maintenance, deployment automation |

Each agent runs through a central gateway on localhost:18789, managed as a systemd user service. The gateway handles routing, rate limiting, and health checks.

The Configuration

OpenClaw uses a single JSON config file (openclaw.json) to define all agents. Here's the structure for one agent:

{
  "id": "architect",
  "name": "Orion",
  "workspace": "/root/.openclaw/workspace-architect",
  "telegram": {
    "bot_token_env": "ARCHITECT_BOT_TOKEN",
    "bot_id": 8468670400
  },
  "llm": {
    "provider": "gemini",
    "model": "gemini-2.0-flash"
  },
  "memory": {
    "type": "hybrid",
    "redis_prefix": "orion:",
    "vector_collection": "orion_memory"
  }
}

The key insight: each agent gets its own workspace directory. This is where its system prompt, memory files, skills, and context live. Complete isolation. Orion can never accidentally read Athena's learning notes, and Midas can't see Ops' deployment logs.

Memory Architecture: The Part That Took the Longest

Getting memory right was the hardest part of the entire setup. The goal: agents should remember what you told them yesterday, last week, or last month — without you having to repeat yourself, and without blowing up the context window.

The Three-Tier Approach

After several iterations, I settled on a three-tier memory system:

Tier 1: Hot Memory (Redis)

Session state, recent conversation context
TTL-based expiry (24 hours for conversations, 7 days for preferences)
Instant retrieval, no embedding cost
Used for: "What were we just talking about?" / "Continue from where we left off"

Tier 2: Warm Memory (pgvector)

Semantic search over past interactions and knowledge
Embeddings generated by nomic-embed-text running locally via Ollama
Zero API cost — the model runs on the VPS itself
Used for: "What did I decide about the Redis architecture last week?"

Tier 3: Cold Memory (File-based)

.md files injected into the system prompt at session start
Stable facts: user preferences, project context, agent instructions
Manually curated, rarely changes
Used for: "Remember that I prefer Go over Python for CLI tools"

Why Hybrid Instead of Pure Vector Search?

I initially tried using pgvector for everything. The problem: vector search retrieves what's semantically similar, not what's contextually relevant.

Example: I asked Orion "How should I design the Redis job queue for async timeout mismatches?" Vector search retrieved a memory about "Redis pub/sub patterns" — semantically close, but not the specific conversation where I'd already decided on BLPOP with callback keys.

The hybrid approach fixes this:

Redis handles exact lookups — session ID, user preferences, recent decisions
pgvector handles fuzzy recall — "that conversation about deployment patterns from two weeks ago"
File injection handles stable context — project facts that don't change

The Impact

Before the hybrid setup, agents would lose context after ~3 conversation turns on complex topics. After:

Cross-session recall accuracy went from ~30% to ~85% — agents could reference decisions made days ago without re-explanation
Context window usage dropped by ~35% — by pulling only relevant memories instead of dumping everything, conversations had more room for actual reasoning
Zero embedding API cost — nomic-embed-text on Ollama handles all embeddings locally. At ~200 messages/day across all agents, this would cost $15-20/month on OpenAI. Now it's $0.

Telegram Integration: The Interface Layer

I chose Telegram as the primary interface for a few reasons:

Available on every device (phone, tablet, desktop)
Push notifications for agent responses
Native support for voice messages (important for the TTS/STT pipeline)
Bot API is simple and well-documented

Each agent gets its own Telegram bot. When I message @architect_bot, it goes directly to Orion. When I message @athena_bot, it goes to Athena. No routing confusion, no "which agent should handle this?" ambiguity.

Voice Pipeline

The voice setup was a later addition but turned out to be one of the most useful features:

STT (Speech-to-Text): Gemini 2.0 Flash — auto-transcribes inbound voice notes from Telegram
TTS (Text-to-Speech): Edge TTS with en-US-AriaNeural voice
Trigger: auto: "inbound" — agents only reply with voice when I send a voice message

This means I can have a full conversation with Orion about system design patterns while driving, without touching the keyboard. The transcription quality from Gemini is noticeably better than Whisper for Indian-accented English.

LLM Strategy: Rotating Gemini Profiles

Instead of using a single API key, I configured OpenClaw with 6 rotating Google profiles for Gemini API access. This gives:

Higher effective rate limits (spread across profiles)
Redundancy if one key hits quota
Zero cost — Gemini's free tier is generous enough for personal use

The gateway rotates through profiles round-robin, with automatic fallback if one returns a 429.

Operational Patterns That Emerged

After running this setup for several months, some patterns emerged that I didn't plan for:

1. The Morning Briefing

Astra (main agent) automatically surfaces:

Unresolved items from yesterday
Calendar events for today (via Kairos)
Learning goals for the week (via Athena)

This wasn't a designed feature — it emerged from configuring the right memory triggers and morning prompt templates.

OpenClaw isolates agent memories by design, but sometimes I need to share context. For example, when Orion reviews a system design, the decision might be relevant to Athena's learning plan.

Currently, I do this manually by referencing the decision in a message to Athena. I'm exploring whether a controlled "memory bridge" — where I can explicitly share a specific memory between agents — would be useful without breaking isolation.

3. Restart Resilience

The system runs as a systemd service:

systemctl --user restart openclaw-gateway.service

On restart, each agent:

Reloads its cold memory (.md files)
Reconnects to Redis for hot session state
pgvector is always available (PostgreSQL service)

No conversation is lost. The agent picks up exactly where it left off.

4. Monitoring

Gateway health: curl localhost:18789/health Agent status: checked via the config's agent registry Memory usage: Redis INFO memory + PostgreSQL pg_stat_user_tables

I don't have a dashboard yet — it's all CLI checks. For a personal system, this is enough.

What I'd Do Differently

Start with fewer agents — I launched with 6 agents on day one. It would have been smarter to start with 2-3 and add agents only when I felt a clear domain boundary forming.
Invest in memory curation earlier — the cold memory (.md files) is the highest-leverage part of the system. Well-curated context files make every agent interaction better. I underinvested in this initially.
Set up voice from day one — the voice pipeline removed so much friction that I wish I'd configured it immediately instead of treating it as a "nice to have."

The Numbers

Uptime: 99.7% since February 2025 (two outages, both from VPS maintenance)
Messages processed: ~200/day across all agents
LLM cost: $0 (Gemini free tier + local embeddings)
Server cost: ~$30/month (Hetzner CPX52)
Memory storage: ~2GB Redis, ~500MB pgvector, ~50MB cold files
Average response time: 2-4 seconds (Gemini latency)

Bottom Line

Running a multi-agent system isn't about having the latest model or the biggest context window. It's about domain isolation, memory architecture, and operational reliability. OpenClaw gave me the orchestration layer; the work was in designing the agent boundaries, curating the memory, and building the habits to actually use it daily.

The system isn't perfect — calendar integration is still half-built, cross-agent collaboration is manual, and I want better search over historical conversations. But it's running, it's useful every single day, and it's taught me more about production AI systems than any course or tutorial ever could.

How I Set Up a 6-Agent AI System That Runs 24/7 on a Single VPS

Enjoyed this?