multi-agent · orchestration · workflow

Agent Orchestration Architecture

How zenflow turns a YAML agent workflow into a running plan. An LLM coordinator routes events between agents through hub-and-spoke mailboxes; the executor schedules a DAG of steps; goai handles the tool loop. Every box, arrow, and label below is a real component you can grep for in the source.

edges
function call tool loop · llm call message delivery (mailbox) data / setup
nodes
input coordinator executor agent · mailbox provider · llm

runtime topology

zenflow runtime topology A diagram showing the flow from user input through the orchestrator, splitting into the coordinator and executor, fanning out to three agents which all call the provider for their tool loop. Function calls are solid green; messages flowing through the mailbox are dashed cyan and animated; the LLM tool loop is solid amber. load + validate hosts coordinator RunFlow(ctx) spawn step forward_to_agent · send_message tool loop INPUT YAML · Goal · Agent ORCHESTRATOR zenflow.New(opts...) holds: model · sinks · storage · coord runner entry points: RunFlow · RunGoal · RunAgent COORDINATOR LLM hub tools: forward_to_agent · narrate finalize EXECUTOR DAG scheduler topo sort · loops · includes forEach · maxConcurrency retries · cancellation AGENT · STEP pro · arguer ▢ mailbox unread: 0/10000 tools: send_message AGENT · STEP con · arguer ▢ mailbox unread: 1/10000 tools: send_message AGENT · STEP judge · summarizer ▢ mailbox unread: 0/10000 tools: send_message PROVIDER · GOAI SDK GenerateText · StreamText google · bedrock · azure · openai · anthropic · vertex

Hub-and-spoke. Solid green = function calls inside the process. Dashed cyan = messages routed through the per-agent mailbox (animated to convey flow direction). Solid amber = the LLM tool loop running through the goai SDK. Peer agents never address each other directly -- every cross-agent message goes through the coordinator's forward_to_agent tool.

Three roles you keep meeting

Orchestrator · Coordinator · Executor

the owner

Orchestrator

The public Go type. One instance per process. Holds your model, progress sinks, storage, the coordinator runner, and the orchestrator-wide options. It doesn't run agents itself -- it constructs an Executor on each Run* call and exposes the entry points.

lifetime
long-lived (per process)
api
zenflow.New(opts...)
entry
RunFlow · RunGoal · RunAgent
source
zenflow.go
the hub

Coordinator

An LLM-backed AgentRunner the orchestrator hosts. Receives every cross-agent message in its own mailbox, decides where each goes (forward_to_agent), narrates progress (narrate), and may finalize the run (finalize). The hub of hub-and-spoke.

lifetime
shared across runs of one Orch
api
WithCoordinator(runner)
factory
NewDefaultCoordRunner(...)
source
coord_factory.go · internal/coord/coord_tools.go · agent_runner.go
the scheduler

Executor

A per-run internal struct. Walks the workflow's DAG topologically, spawns one AgentRunner per step, owns the MessageRouter / Mailbox / delivery-engine lifecycle, and assembles the WorkflowResult. Lives only for the duration of one Run* call.

lifetime
per Run* call (transient)
api
internal -- not user-constructed
owns
MessageRouter · Mailbox · deliveryEngine
source
executor.go

Three modes, one engine

How a workflow starts

flow

Run a YAML DAG

Run a fully-declared .yaml workflow end to end. The agents, dependencies, and conditions are all explicit before the first LLM call -- diffable, reviewable in a PR.

cli
zenflow flow workflow.yaml
library
orch.RunFlow(ctx, wf)
good for
known plans, repeated runs
goal

Decompose a goal

Hand a single sentence to the coordinator and let it generate a workflow on the fly. The decomposition is auditable -- you can --plan it before approving.

cli
zenflow goal "ship the launch"
library
orch.RunGoal(ctx, goal)
good for
exploratory, ad-hoc work
agent

Single agent chat

One agent, one task, one tool loop. Zero coordinator. Useful for embedded chat surfaces, quick one-shots, and the foundation that the other modes are built on.

cli
zenflow agent "review the diff"
library
orch.RunAgent(ctx, cfg)
good for
chat, scripts, embedders

Hub-side primitives

Coordinator message tools

The coordinator wires three tools: forward_to_agent for routing, narrate for progress events, and finalize for terminal synthesis. Its agent-side counterpart, send_message, is documented in the next section.

tool

forward_to_agent

{ target_step_id, text, kind? }

Routes a message to a step's mailbox via MessageRouter.Send. Wrapper steps (loop · include) are rejected at the tool boundary so the coord can't address a container that has no agent.

tool

narrate

{ text }

Pushes an EventCoordinatorNarration onto the progress sink. Stdout sink renders with ; JSON sink emits a structured event for downstream observability.

tool

finalize

{ summary? }

Signals "the workflow is done, here's my synthesis." CLI treats it as advisory -- DAG completion is authoritative. Embedders may select on runner.Done() to honour it.

Auto-injected on the spoke

Agent-side tools

tool

send_message

{ text }

Auto-injected on every non-coordinator step runner that has a MessageRouter wired. Routes the message back to the coordinator's mailbox -- the spoke-to-hub leg of hub-and-spoke.

tool

submit_result

{ ...schema fields }

Auto-injected when the step's cfg.ResultSchema is non-nil. Returns the structured output that ends up in StepResult.Output; goai validates the payload against the schema before the runner accepts it.

tool

agent

{ name, prompt, ... }

Auto-injected when a child spawner is wired (AgentToolDef). Spawns a child AgentRunner for hierarchical decomposition -- the parent step waits on the child's result before continuing its own loop.

Delivery contract

Race-safe Mailbox + Wake

guarantee

Per-agent mailbox · per-step lock

Each step has its own InMemoryMailboxStore and an RWMutex the executor takes when transitioning lifecycle (start, idle, terminal). The delivery engine ticks every 500 ms and signals an idle agent's Wake channel only when its inbox grew, so we never enter goai.GenerateText (or goai.StreamText when streaming is enabled) on a still-empty queue.

store
MailboxStore (interface)
delivery
deliveryEngine.tick(500ms)
wake
chan struct{} (cap 1)
contract

Zero silent drops · typed reasons

Every dropped message produces exactly one EventMessageDropped with a typed DropReason. The reasons cover every observable outcome: workflow-cancelled, target-terminal, unknown-step, mailbox-closed-by-finalize, max-wake-cycles, hold-timeout, mailbox-full, no-transcript, transcript-too-large, resume-shutdown, resolver-error. No message disappears without a verdict.

event
EventMessageDropped
reasons
11 typed drop categories
error
*DropError (errors.As)
limits

Hard caps

The engine enforces a small set of safety ceilings so a runaway DAG, an infinitely chatty coordinator, or a flooded mailbox can't blow past process bounds. Each cap is overrideable via the matching With* option where the use-case justifies it.

steps / workflow
MaxStepsPerWorkflow = 100
include depth
MaxIncludeDepth = 5
nesting depth
MaxNestingDepth = 20
mailbox size
DefaultMaxMailboxSize = 10000
coord wake cycles
coordDefaultMaxWakeCycles = 100

Lifecycle & recovery

From plan to result

executor

DAG scheduler · loops · includes

Kahn-style topological scheduling over the workflow's steps, respecting dependsOn, condition (CEL), and maxConcurrency. Loops resolve their inner DAGs lazily; includes import sub-workflows; forEach spawns one inner DAG per item, capped at the loop's maxConcurrency.

schedule
topoSort(steps)
parallel
sem ← chan struct{}
loop
repeatUntil · forEach
output

Result · transcript · OTel

On completion the executor returns a WorkflowResult with per-step status, token counts, and content. A TranscriptStore persists every LLM turn for resume-from-checkpoint. The optional OTel adapter exports each step as a span tree so latency, errors, and token usage drop straight into your existing observability stack.

result
WorkflowResult
resume
orch.ResumeFlow(ctx, runID, wf)
trace
zenflow.flow · step · coord