Skill Index

relay/

choosing-swarm-patterns

community[skill]

Use when coordinating multiple AI agents and need to pick the right orchestration pattern - covers 10 patterns (fan-out, pipeline, hub-spoke, consensus, mesh, handoff, cascade, dag, debate, hierarchical) with decision framework and reflection protocol

$/plugin install relay

details

Choosing Swarm Patterns

Overview

10 orchestration patterns for multi-agent workflows. Pick the simplest pattern that solves the problem — add complexity only when the system proves it's insufficient.

Quick Decision Framework

Is the task independent per agent?
  YES → fan-out (parallel workers)

Does each step need the previous step's output?
  YES → Is it strictly linear?
    YES → pipeline
    NO  → dag (parallel where possible)

Does a coordinator need to stay alive and adapt?
  YES → Is there one level of management?
    YES → hub-spoke
    NO  → hierarchical (multi-level)

Is the task about making a decision?
  YES → Do agents need to argue opposing sides?
    YES → debate (adversarial)
    NO  → consensus (cooperative voting)

Does the right specialist emerge during processing?
  YES → handoff (dynamic routing)

Do all agents need to freely collaborate?
  YES → mesh (peer-to-peer)

Is cost the primary concern?
  YES → cascade (cheap model first, escalate if needed)

Pattern Reference

#PatternTopologyAgentsBest For
1fan-outStar (SDK center)N parallelIndependent subtasks (reviews, research, tests)
2pipelineLinear chainSequentialOrdered stages (design → implement → test)
3hub-spokeStar (live hub)1 lead + N workersDynamic coordination, lead reviews/adjusts
4consensusBroadcast + voteN votersArchitecture decisions, approval gates
5meshFully connectedN peersBrainstorming, collaborative debugging
6handoffRouting chain1 active at a timeTriage, specialist routing, support flows
7cascadeTiered escalationCheapest → most capableCost optimization, production workloads
8dagDependency graphParallel + joinsComplex projects with mixed dependencies
9debateAdversarial rounds2+ debaters + judgeRigorous evaluation, architecture trade-offs
10hierarchicalTree (multi-level)Lead → coordinators → workersLarge teams, domain separation

Pattern Details

1. fan-out — Parallel Workers

fanOut([
  { task: "Review auth.ts", name: "AuthReviewer" },
  { task: "Review db.ts", name: "DbReviewer" },
], { cli: "claude" });
  • Workers run independently, no inter-agent communication
  • SDK collects all DONE messages
  • Use when: tasks are embarrassingly parallel

2. pipeline — Sequential Stages

pipeline([
  { task: "Design the API schema", name: "Designer" },
  { task: "Implement the endpoints", name: "Implementer" },
  { task: "Write integration tests", name: "Tester" },
]);
  • Stage N+1 receives Stage N's DONE summary as context
  • Pipeline halts on failure
  • Use when: clear linear dependency chain

3. hub-spoke — Persistent Coordinator

hubAndSpoke({
  hub: { task: "Coordinate building a REST API", name: "Lead" },
  workers: [
    { task: "Build database models", name: "DbWorker" },
    { task: "Build route handlers", name: "ApiWorker" },
  ],
});
  • Hub stays alive, receives ACK/DONE from workers
  • Hub can spawn additional workers dynamically
  • Use when: lead needs to review, adjust, and make decisions

4. consensus — Cooperative Voting

consensus({
  proposal: "Should we migrate to Fastify?",
  voters: [
    { task: "Evaluate performance", name: "PerfExpert" },
    { task: "Evaluate DX", name: "DxExpert" },
  ],
  consensusType: "majority",
});
  • Agents independently evaluate, then VOTE: approve/reject
  • Supports majority, supermajority, unanimous, weighted, quorum
  • Use when: need a decision with diverse perspectives

5. mesh — Peer Collaboration

mesh({
  goal: "Debug the auth flow returning 500",
  agents: [
    { task: "Check server logs", name: "LogAnalyst" },
    { task: "Review auth code", name: "CodeReviewer" },
    { task: "Write repro test", name: "Tester" },
  ],
});
  • All agents on same channel, free communication
  • Round tracking detects stalls
  • Use when: collaborative exploration without hierarchy

6. handoff — Dynamic Routing

handoff({
  entryPoint: { task: "Triage the request", name: "Triage" },
  routes: [
    { agent: { task: "Handle billing", name: "Billing" }, condition: "billing, payment" },
    { agent: { task: "Handle tech issues", name: "TechSupport" }, condition: "error, bug" },
  ],
  maxHandoffs: 3,
});
  • One active agent at a time; transfers control dynamically
  • Circuit breaker prevents infinite routing loops
  • Use when: right specialist isn't known upfront

7. cascade — Cost-Aware Escalation

cascade({
  tiers: [
    { agent: { task: "Answer this", cli: "claude" }, confidenceThreshold: 0.7, costWeight: 1 },
    { agent: { task: "Answer this", cli: "claude" }, confidenceThreshold: 0.85, costWeight: 5 },
    { agent: { task: "Answer this", cli: "claude" }, costWeight: 20 },
  ],
});
  • Start cheap, escalate if confidence < threshold
  • Agent reports: DONE [confidence=0.4]: <answer>
  • Use when: most tasks are simple, some need heavy reasoning

8. dag — Directed Acyclic Graph

dag({
  nodes: [
    { id: "scaffold", task: "Create project scaffold" },
    { id: "frontend", task: "Build React UI", dependsOn: ["scaffold"] },
    { id: "backend", task: "Build API", dependsOn: ["scaffold"] },
    { id: "integrate", task: "Wire together", dependsOn: ["frontend", "backend"] },
  ],
  maxConcurrency: 3,
});
  • Topological sort determines execution order
  • Independent nodes run in parallel
  • Use when: pipeline is too linear, fan-out is too flat

9. debate — Adversarial Refinement

debate({
  topic: "Monorepo vs polyrepo for the new platform?",
  debaters: [
    { task: "Argue for monorepo", position: "monorepo" },
    { task: "Argue for polyrepo", position: "polyrepo" },
  ],
  judge: { task: "Judge and decide", name: "ArchJudge" },
  maxRounds: 3,
});
  • Structured rounds: ARGUMENT → counterargument → VERDICT
  • Optional judge; without judge, agents self-converge or split
  • Use when: need rigorous adversarial examination

10. hierarchical — Multi-Level Delegation

hierarchical({
  agents: [
    { id: "lead", task: "Coordinate full-stack app", role: "lead" },
    { id: "fe-coord", task: "Manage frontend", role: "coordinator", reportsTo: "lead" },
    { id: "be-coord", task: "Manage backend", role: "coordinator", reportsTo: "lead" },
    { id: "fe-dev", task: "Build components", role: "worker", reportsTo: "fe-coord" },
    { id: "be-dev", task: "Build API", role: "worker", reportsTo: "be-coord" },
  ],
});
  • Workers → coordinators → lead (multi-level reporting)
  • Coordinators synthesize sub-team output
  • Use when: too many workers for one hub to manage

Reflection Protocol

All patterns support reflection — periodic synthesis that enables course correction. Enabled via reflectionThreshold on WorkflowOptions.

{
  reflectionThreshold: 10, // trigger after 10 agent messages
  onReflect: async (ctx) => {
    // Examine ctx.recentMessages, ctx.agentStatuses
    // Return adjustments or null
  },
}

Reflection is event-driven (importance-weighted accumulation), not timer-based. See WORKFLOWS_SPEC.md for full details.

Common Mistakes

MistakeWhy It FailsFix
Using mesh for everythingO(n^2) communication, debugging nightmareUse hub-spoke for most tasks
Pipeline for independent workSequential bottleneckUse fan-out or dag
Hub-spoke for simple parallel tasksHub is unnecessary overheadUse fan-out
Consensus for non-decisionsVoting on implementation tasks wastes timeUse hub-spoke, let lead decide
No circuit breaker on handoffInfinite routing loopsAlways set maxHandoffs
Cascade without confidence parsingAgents don't report confidenceConvention injection handles this
Hierarchical for 3 agentsManagement overhead exceeds benefitUse hub-spoke for small teams

DAG Executor — Proven Pattern

The recommended architecture for DAG workflow execution, validated on a 9-node / 5-wave production run.

Agent Completion: Detect → Release → Collect

This is the critical pattern. Claude Code agents don't auto-exit — the orchestrator must detect completion and release them.

Agent writes summary file → Orchestrator polls (5s) → Detects new mtime →
  Reads summary → Calls client.release(agent) → agent_exited fires → Node marked complete

Implementation:

// Track initial mtime to distinguish new writes from stale files
let initialMtime = 0;
try { initialMtime = statSync(summaryPath).mtimeMs; } catch {}

// Poll for summary file every 5s
const poll = setInterval(() => {
  const stat = statSync(summaryPath);
  if (stat.mtimeMs > initialMtime) {
    const content = readFileSync(summaryPath, "utf-8").trim();
    await client.release(agentName); // triggers agent_exited
    finish("completed", content);
  }
}, 5_000);

Convention injection tells agents to:

  1. Send summary via Relaycast MCP (mcp__relaycast__send to channel) for inter-agent communication
  2. Write summary to .relay/summaries/{nodeId}.md as the completion signal
  3. Include file paths, type names, method signatures — downstream agents depend on this

Communication: Relaycast MCP

Agents communicate through the Relaycast MCP, not file-based protocols:

  • Channel messages: mcp__relaycast__send with channel name
  • Direct messages: mcp__relaycast__dm with agent name
  • Claude Code agents inherit .mcp.json config and have full MCP access
  • Other CLIs (codex, aider) may not have MCP — use summary files as fallback

State & Resume

Persist state after every node completion for crash recovery:

saveState(completed, depsOutput, results, startTime);
// Restart with --resume to skip completed nodes

Pitfall: When resuming, only load completed nodes — never load failed entries, or downstream will be permanently blocked.

Pitfalls Reference

CategoryPitfallFix
CompletionWaiting for agent_exited without releasing — agents idle until timeoutPoll for summary file, release agent when detected
CompletionNo resolved guard — poll interval and timeout both fire, double-resolveresolved boolean flag checked before every resolve
SignalsPTY prompt echo matches signal keywords (DONE:, ERROR:) causing false completionNever put signal keywords in task prompts; use file-based signals
SummariesThin summaries ("Created types") useless for downstream agentsConvention injection requires file paths, signatures, key exports
ExecutionPromise.race in batch — one success masks later failuresPromise.allSettled for each batch
ResilienceNo --resume — orchestrator crash loses all progressPersist completed set + depsOutput after each node
ResilienceNo downstream failure propagation — dependents stuck in limboMark all transitive dependents as "blocked" on failure
ConventionAgents don't read existing code — output doesn't match project patternsreadFirst field per node, included in convention injection
CapabilitiesAssuming all CLIs have MCP tools — codex/aider may notCheck CLI capabilities; use summary files as fallback for non-Claude CLIs
InfrastructureRust broker vs Node.js CLI binary confusion (same name, different behavior)Always set explicit binaryPath; use unique broker names to avoid 409 conflicts
InfrastructuregetLogs() assumes Node.js daemon log files — Rust broker doesn't write themUse broker events or summary files, not log file polling

YAML Workflow Definition

Any pattern can be defined in YAML for portability:

version: "1.0"
name: feature-dev
pattern: hub-spoke
agents:
  - id: lead
    role: lead
    cli: claude
  - id: developer
    role: worker
    cli: codex
    reportsTo: lead
steps:
  - id: plan
    agent: lead
    prompt: "Create a development plan for: {{task}}"
    expects: "PLAN_COMPLETE"
  - id: implement
    agent: developer
    dependsOn: [plan]
    prompt: "Implement: {{steps.plan.output}}"
    expects: "DONE"
reflection:
  enabled: true
  threshold: 10
trajectory:
  enabled: true

Store in .relay/workflows/ and run with:

const workflow = await loadWorkflow(".relay/workflows/feature-dev.yaml");
const run = runWorkflow(workflow, "Add user authentication");

technical

github
AgentWorkforce/relay
stars
628
license
Apache-2.0
contributors
14
last commit
2026-04-21T02:25:55Z
file
.claude/skills/choosing-swarm-patterns/SKILL.md

related