AI agent orchestration in practice: a control plane for your local coding agents
Running five AI coding agents across tmux tabs is chaos. A control-plane design for orchestrating them in parallel — and how to know when an agent is actually done.
Open Claude Code, Codex, and Qwen in three terminal tabs, point them at the same repo, and within ten minutes you're lost. Which one finished? What did it change? Did the second one just stomp on the first one's work? That's the daily reality of running multiple AI coding agents, and developers have started naming the missing piece out loud: a Mission Control, a Jira for AI agents, a control plane for local agents. This is a system-design walk-through of building that control plane — the architecture, the genuinely hard sub-problems, and the parts we got wrong — using our own tool, Jerico, as the worked example.
Why is running multiple AI coding agents so chaotic?
The pain is consistent enough that developers keep describing it the same way. In an Ask HN thread asking for a "control plane for local AI agents," one developer described hitting a wall once several agents work sub-tasks in a single repo: "terminal logs become unmanageable," and you need to track progress and intervene "without context-switching between 5 terminal tabs." A reply was blunter — there's "no off-the-shelf 'Jira for AI' product out there right now," so teams "either hack together thin SQLite + WebSockets dashboards, or just suck it up and stare at raw stdout across 5 tmux panes."
That's the gap, exactly. tmux gives you panes; it doesn't give you task dispatch, dependency tracking, completion signals, or review. And the moment two agents share a working directory, they start stepping on each other. What's missing isn't another agent framework — it's an operational layer over the agents you already run. The recurring r/LocalLLaMA refrain says it best: local coding agents are good now, but only if you babysit them. The babysitting is the problem worth solving.
What pattern does an agent control plane actually implement?
It has a canonical name. In Building Effective Agents, Anthropic describes the orchestrator-workers pattern: "a central LLM dynamically breaks down tasks, delegates them to worker LLMs, and synthesizes their results." That's precisely the job. The catch is that the canonical examples — and the subagents you get from the Claude Agent SDK — assume worker LLMs you call through an API, inside one process. The version developers are missing is the same pattern aimed at the real CLI tools already installed on their machine: Claude Code, Codex, Qwen, Aider — different vendors, each in its own terminal. Jerico is a reference implementation of orchestrator-workers for local AI coding CLIs.
What architecture works for a local-agent control plane?
Three tiers plus a structured back-channel. The data path is browser → server → daemon → the AI CLI, and a separate MCP server gives the agents a way to talk back in structured tool calls instead of being screen-scraped.
- The browser is the control room — each agent is a live terminal panel in a grid; you watch, and you intervene when you need to.
- The server is a WebSocket relay and the orchestration brain — it holds run state in SQLite, decomposes specs into tasks, and schedules the work.
- The daemon runs on your machine and spawns real PTY (pseudo-terminal) sessions of whatever CLIs you have installed. One browser can drive daemons on several machines at once.
- The MCP server is the structured channel — agents call tools (complete a task, spawn a worker, read another panel) rather than the system guessing from raw output.
The reason for the daemon split is the whole point: the agents run where your code is — your files, your git branches, your installed tools — not in a cloud sandbox. The relay just moves bytes between the browser and the machine. We leaned hard on the recovery paths of this design (an end-to-end suite of 21 of 21 passing on the disconnect-and-reconnect milestone), because a daemon dropping mid-task is not an edge case here — it's the central failure mode the architecture has to absorb.
How do you dispatch one spec to many agents in parallel?
You don't write an agent graph in code. You write a spec in plain English — or point Jerico at a repo and let it read the README and CLAUDE.md — and a Claude process decomposes it into typed todos with explicit dependency links. From there it's deterministic graph machinery, which is the point: a soft, LLM-generated plan gets run through hard scheduling.
- Cycle detection — Kahn's algorithm runs before anything dispatches and rejects circular dependencies up front.
- Parallel layers — todos sort into topological layers (layer 0 has no dependencies, layer 1 depends only on layer 0, and so on); everything in a layer runs at once, capped at a max-parallel limit (10 by default) so you don't spawn a thundering herd.
- Role routing — implementation todos go to a developer panel, review to a reviewer, infra to a shell or executor, with a fallback chain when the exact role isn't free.
- Context chaining — when a dependency finishes, the tail of its transcript is injected into the dependent task's prompt, so a reviewer sees what the implementer actually did.
How do you know when an AI coding agent is done?
This is the unsolved one, and it's worth dwelling on because everyone building in this space hits it. There is no shared "task complete" signal across Claude, Qwen, Codex and shell — and the obvious move, scanning terminal output for a keyword, is sabotaged by the terminal itself, which echoes back the instruction you typed. So Jerico runs three detectors at once and picks the right one per agent based on what that agent can actually do.
| Detector | Best for | How it fires | Trade-off |
|---|---|---|---|
| MCP tool call | Claude panels with MCP configured | The agent calls a structured bridge_complete_task tool | Fully reliable, no parsing — but only where MCP is wired up |
| PTY sentinel | Any terminal agent without MCP | The agent prints a unique JERICO_DONE_ token on its own line; an output watcher matches it | Works with any CLI; the cost is that it's heuristic and fragile |
| Idle detection | Shell commands and silent agents | No new output for a timeout window, then a light poll | Needs no cooperation; the cost is that timeouts are guesses |
The sentinel path hides a subtle trap, and the fix is the part I'm most fond of. Because a PTY echoes the instruction you inject, if the complete token ever appeared verbatim in the prompt text, the watcher would match your own instruction and fire instantly. Jerico avoids this by never writing the whole token in the injection: the prompt describes it in split form — the prefix JERICO_DONE_ followed, separately, by a hex suffix — and the agent assembles the full literal only when it genuinely finishes. The watcher matches that full literal, which by construction never appears in the instruction. A sanitisation pass also strips stray JERICO_DONE_ fragments out of injected context (previous transcripts, retry notes, sibling task titles) so echoed context can't false-trigger either.
How do you stop agent review loops from going infinite?
Orchestration without review just produces unreviewed code faster, so every implementation todo that nobody is reviewing gets a review todo auto-generated for it and routed to a reviewer agent, which scans across a few lenses — correctness, simplicity, security, observability. The hard part isn't running the review; it's stopping it, because a reviewer can reject forever. The guardrails are blunt on purpose: a 2-retry reviewer cap (on the final pass it must approve, noting caveats, or the implementation is capped and marked failed), a separate 3-retry implementation cap, and a cascade-block so dependent reviews don't spin on work that already gave up. These caps weren't designed in the abstract — they came from a real run where a scripted reviewer rejected indefinitely, and the fix shipped from that wall.
What happens when a daemon disconnects mid-orchestration?
A long run will hit failure — that's the operating condition, not the exception — so the patterns that earn their keep are the ones that assume things break.
- State rehydration from disk — failure counts, retry context, the originating todo and transcripts are persisted, so killing the server mid-run isn't fatal: on restart it reloads paused and running sessions, and a resumed task is told to first check whether it already finished before redoing work.
- Zombie watchdog — a poller fails todos stuck past a 10-minute timeout, using an entropy-based check to tell "still working" from "wedged."
- Circuit breaker — after 3 consecutive failures it stops the cascade instead of burning down the whole run.
- Rate-limit auto-pause and resume — Claude usage is tracked from local logs; near the quota ceiling the run auto-pauses, and a watchdog resumes it once the window resets.
- Session resume — when a daemon reconnects, agents that support it (today, Claude and Qwen) pick up their prior session instead of starting cold.
Why build a system instead of using a library or raw tmux?
The agent-framework world — LangGraph, CrewAI, AutoGen — is libraries you program against: you write code to define state, nodes, edges and handoffs. That's the right tool for building a bespoke agent application. Raw tmux is the other extreme — total manual control, zero coordination. A control plane sits in the gap: you express work as a spec, and a running system dispatches it across the real CLI agents already on your machine.
| Agent frameworks (LangGraph, CrewAI, AutoGen) | Raw tmux + tabs | Control plane (Jerico) | |
|---|---|---|---|
| How you express work | Write code: state, nodes, edges, handoffs | Type commands by hand, per pane | Write a natural-language spec |
| What runs the agents | Library calls, usually one vendor's API | Whatever you launch in a pane | Real PTY sessions of any installed CLI |
| Vendor mix | Usually one provider at a time | Manual, per pane | Claude, Codex, Qwen, Gemini, Aider, shell together |
| Knows when a task is done | You code the state transition | You eyeball it | Three detectors: MCP, sentinel, idle |
| Survives a crash | Depends on your checkpointing | No — you re-run by hand | State rehydrates from disk |
| Multi-machine | Out of scope | No | One browser drives several daemons |
What we got wrong, and what's still missing
Every pattern above has a cost, and some parts are genuinely unfinished. I'd rather say so plainly than imply a polished platform.
- Sentinel completion is heuristic and fragile by design — a fallback, not something to trust the way you trust the MCP path.
- Claude quota tracking is approximate — Anthropic's exact sliding-window algorithm isn't public, so a local count from log files is a good estimate, not ground truth.
- Decomposition quality is LLM-dependent — the graph machinery guards against cycles and unreviewed merges, not against a poorly conceived plan.
- It's single-node and single-user today — server state lives in one process (no Redis or multi-node yet), and while tenants are isolated there's no shared team workspace.
- Session resume is partial — Claude and Qwen resume; other agents auto-respawn without session continuity.
- Auth is email-only — no OAuth, no billing, no mobile UI; the desktop shell and auto-updater aren't generally available.
Try it — and tell us where it breaks
If you're already juggling Claude Code, Codex and Qwen and the tmux-tab dance is wearing you down, Jerico is open to try. It's beta — a local daemon you install on your machine plus a browser dashboard — and it's honest about the rough edges above. Start from the product page at appnova.io/products/jerico, or go straight to the public repo at github.com/Appnova-EU-OU/jerico for releases and install steps. If you build orchestration tooling yourself, steal the two things that'll save you the most pain: the three-detector completion model, and the assume-it-will-crash posture. And if you do try it, tell us where it breaks — most of the patterns in this post exist because an earlier version did.
Sources
← Back to blog