Agent Orchestration Architecture

edges

function call tool loop · llm call message delivery (mailbox) data / setup

nodes

input coordinator executor agent · mailbox provider · llm

runtime topology

Hub-and-spoke. Solid green = function calls inside the process. Dashed cyan = messages routed through the per-agent mailbox (animated to convey flow direction). Solid amber = the LLM tool loop running through the goai SDK. Peer agents never address each other directly -- every cross-agent message goes through the coordinator's forward_to_agent tool.

Three roles you keep meeting

Orchestrator · Coordinator · Executor

the owner

Orchestrator

The public Go type. One instance per process. Holds your model, progress sinks, storage, the coordinator runner, and the orchestrator-wide options. It doesn't run agents itself -- it constructs an Executor on each Run* call and exposes the entry points.

lifetime: long-lived (per process)
api: zenflow.New(opts...)
entry: RunFlow · RunGoal · RunAgent
source: zenflow.go

the hub

Coordinator

An LLM-backed AgentRunner the orchestrator hosts. Receives every cross-agent message in its own mailbox, decides where each goes (forward_to_agent), narrates progress (narrate), and may finalize the run (finalize). The hub of hub-and-spoke.

lifetime: shared across runs of one Orch
api: WithCoordinator(runner)
factory: NewDefaultCoordRunner(...)
source: coord_factory.go · internal/coord/coord_tools.go · agent_runner.go

the scheduler

Executor

A per-run internal struct. Walks the workflow's DAG topologically, spawns one AgentRunner per step, owns the MessageRouter / Mailbox / delivery-engine lifecycle, and assembles the WorkflowResult. Lives only for the duration of one Run* call.

lifetime: per Run* call (transient)
api: internal -- not user-constructed
owns: MessageRouter · Mailbox · deliveryEngine
source: executor.go

Three modes, one engine

How a workflow starts

flow

Run a YAML DAG

Run a fully-declared .yaml workflow end to end. The agents, dependencies, and conditions are all explicit before the first LLM call -- diffable, reviewable in a PR.

cli: zenflow flow workflow.yaml
library: orch.RunFlow(ctx, wf)
good for: known plans, repeated runs

goal

Decompose a goal

Hand a single sentence to the coordinator and let it generate a workflow on the fly. The decomposition is auditable -- you can --plan it before approving.

cli: zenflow goal "ship the launch"
library: orch.RunGoal(ctx, goal)
good for: exploratory, ad-hoc work

agent

Single agent chat

One agent, one task, one tool loop. Zero coordinator. Useful for embedded chat surfaces, quick one-shots, and the foundation that the other modes are built on.

cli: zenflow agent "review the diff"
library: orch.RunAgent(ctx, cfg)
good for: chat, scripts, embedders

Hub-side primitives

Coordinator message tools

The coordinator wires three tools: forward_to_agent for routing, narrate for progress events, and finalize for terminal synthesis. Its agent-side counterpart, send_message, is documented in the next section.

tool

forward_to_agent

{ target_step_id, text, kind? }

Routes a message to a step's mailbox via MessageRouter.Send. Wrapper steps (loop · include) are rejected at the tool boundary so the coord can't address a container that has no agent.

tool

narrate

{ text }

Pushes an EventCoordinatorNarration onto the progress sink. Stdout sink renders with ≋; JSON sink emits a structured event for downstream observability.

tool

finalize

{ summary? }

Signals "the workflow is done, here's my synthesis." CLI treats it as advisory -- DAG completion is authoritative. Embedders may select on runner.Done() to honour it.

Auto-injected on the spoke

Agent-side tools

tool

send_message

{ text }

Auto-injected on every non-coordinator step runner that has a MessageRouter wired. Routes the message back to the coordinator's mailbox -- the spoke-to-hub leg of hub-and-spoke.

tool

submit_result

{ ...schema fields }

Auto-injected when the step's cfg.ResultSchema is non-nil. Returns the structured output that ends up in StepResult.Output; goai validates the payload against the schema before the runner accepts it.

tool

agent

{ name, prompt, ... }

Auto-injected when a child spawner is wired (AgentToolDef). Spawns a child AgentRunner for hierarchical decomposition -- the parent step waits on the child's result before continuing its own loop.

Delivery contract

Race-safe Mailbox + Wake

guarantee

Per-agent mailbox · per-step lock

Each step has its own InMemoryMailboxStore and an RWMutex the executor takes when transitioning lifecycle (start, idle, terminal). The delivery engine ticks every 500 ms and signals an idle agent's Wake channel only when its inbox grew, so we never enter goai.GenerateText (or goai.StreamText when streaming is enabled) on a still-empty queue.

store: MailboxStore (interface)
delivery: deliveryEngine.tick(500ms)
wake: chan struct{} (cap 1)

contract

Zero silent drops · typed reasons

Every dropped message produces exactly one EventMessageDropped with a typed DropReason. The reasons cover every observable outcome: workflow-cancelled, target-terminal, unknown-step, mailbox-closed-by-finalize, max-wake-cycles, hold-timeout, mailbox-full, no-transcript, transcript-too-large, resume-shutdown, resolver-error. No message disappears without a verdict.

event: EventMessageDropped
reasons: 11 typed drop categories
error: *DropError (errors.As)

limits

Hard caps

The engine enforces a small set of safety ceilings so a runaway DAG, an infinitely chatty coordinator, or a flooded mailbox can't blow past process bounds. Each cap is overrideable via the matching With* option where the use-case justifies it.

steps / workflow: MaxStepsPerWorkflow = 100
include depth: MaxIncludeDepth = 5
nesting depth: MaxNestingDepth = 20
mailbox size: DefaultMaxMailboxSize = 10000
coord wake cycles: coordDefaultMaxWakeCycles = 100

Lifecycle & recovery

From plan to result

executor

DAG scheduler · loops · includes

Kahn-style topological scheduling over the workflow's steps, respecting dependsOn, condition (CEL), and maxConcurrency. Loops resolve their inner DAGs lazily; includes import sub-workflows; forEach spawns one inner DAG per item, capped at the loop's maxConcurrency.

schedule: topoSort(steps)
parallel: sem ← chan struct{}
loop: repeatUntil · forEach

output

Result · transcript · OTel

On completion the executor returns a WorkflowResult with per-step status, token counts, and content. A TranscriptStore persists every LLM turn for resume-from-checkpoint. The optional OTel adapter exports each step as a span tree so latency, errors, and token usage drop straight into your existing observability stack.

result: WorkflowResult
resume: orch.ResumeFlow(ctx, runID, wf)
trace: zenflow.flow · step · coord

go deeper

architecture overview orchestrator coordinator dag scheduling messaging loops composition observability failure handling resume yaml reference