Coordinator
The coordinator is an LLM that watches a workflow run, narrates progress, routes messages between steps, and produces a final synthesis. It is the difference between a DAG executor and an agentic system.
The coordinator is optional. Without one, zenflow runs the DAG quietly: steps execute in dependency order, message-routing tools that target the coordinator drop, and the workflow ends with whatever step outputs accumulated. Most non-trivial flows benefit from a coordinator.
The cast above is the coordinator at work on the full-featured workflow (6 top-level steps + a deploy_staging sub-workflow). Watch the ≋ [coordinator] ... lines: every step start, every step completion, every sub-workflow boundary fires narrate, and the final summary lands once the deployer verifies the build.
What the coordinator does
A coordinator runner has a mailbox. The executor pushes lifecycle events into that mailbox as RouterMessage envelopes:
EventStepStart- a step's agent began.EventStepEnd- a step terminated (any status).EventStepSkipped- a step was skipped (failed dep or condition).EventError- a step or executor error fired.EventCoordinatorInboxMessage- a resumed step's reply (or other router-delivered message) landed in the coordinator inbox.- Messages from
send_messagecalls in step agents (forwarded to the coordinator inbox).
After each push, the executor signals the coordinator's Wake channel. The runner's tool loop drains the new mailbox messages, asks the LLM what to do, and executes any tool calls the LLM emitted.
The coordinator's three default tools are:
forward_to_agent(target_step_id, text, kind?)- inject a message into a running step's mailbox. The step agent sees the message in its conversation context on its next turn.narrate(text)- emit a user-facing narration. Surfaces in the CLI output and JSON sink as anEventCoordinatorNarration.finalize(summary?)- signal coordination is complete. The workflow'sWorkflowResult.Summaryis set to the summary string.finalizeflips the runner'sfinalizedflag and closes theDonechannel; the CLI's coord loop continues until the workflow's DAG completion cancels its context (the CLI does NOT consultrunner.Finalized()to exit). Embedders building custom coord loops can choose to honorrunner.Finalized()as an exit signal - see coord-tools.md for the canonical loop shape.
A coordinator that never calls finalize is fine: when the executor finishes the DAG, it cancels the runner's context and the loop exits naturally. finalize is a hint for "I have nothing more to say"; it is not required.
Wake cycles
The coordinator runner's Run method is a wake-driven loop:
- Drain the mailbox into the conversation history.
- Ask the LLM for a response.
- Execute any tool calls (
forward_to_agent,narrate,finalize, plus any extras you wired in). - If the loop is in mailbox-mode and
Wakehas not been signalled, exit. Wait for the next signal. - If
Wakewas signalled, loop back to step 1.
The cap on wake re-entries per Run invocation is 100 wake cycles by default; raise via WithCoordMaxWakeCycles(n). When hit, remaining mailbox messages are drained as drops with DropReasonMaxWakeCycles.
This loop is what lets the coordinator be both reactive (it wakes per event) and stateless across waits (no goroutine pinned to it when the mailbox is empty).
Cold start vs continuation
The coordinator's system prompt is the same on every invocation. What changes is the mailbox content and the conversation history.
- Cold start: the first wake of a fresh run. The mailbox starts empty (or holds an initial
workflow_startevent whenWithFlowContextwas supplied). The coordinator narrates whatever the first event tells it about, or stays silent. - Continuation: any subsequent wake. The conversation history carries every prior turn. The coordinator's job is to answer the new event without repeating itself.
The DefaultCoordSystemPrompt is tuned for the continuation case: "narrate ONCE per significant event", "do NOT repeat narration you already emitted", "exit naturally when nothing new is happening". The cold-start case usually lands one short narration like "<agent> started <step-id>" and exits.
Installing a coordinator
Two paths.
CLI default
The CLI binary installs a default coordinator automatically for zenflow flow and zenflow goal. You see narration in the output, a final summary after the run, and (in --json mode) coordinator_narration and coordinator_synthesis events.
--quiet suppresses narration but keeps the runner installed. --summary-only switches the runner to SynthesizeOnly() mode (no narrate tool, just finalize with a summary). See CLI flags.
Library: NewDefaultCoordRunner
import (
"context"
"github.com/zendev-sh/zenflow"
)
coord := zenflow.NewDefaultCoordRunner(llm)
orch := zenflow.New(
zenflow.WithModel(llm),
zenflow.WithCoordinator(coord),
zenflow.WithProgress(progressSink),
)
// Caller owns the runner's lifecycle.
ctx, cancel := context.WithCancel(parentCtx)
go func() {
_, _ = coord.Run(ctx, zenflow.AgentConfig{}, "", "", coord.Tools)
}()
defer cancel()
result, err := orch.RunFlow(ctx, wf)NewDefaultCoordRunner returns a pre-configured *AgentRunner:
StepID = "coordinator"(matches the executor's reverse-reply inbox key).- A fresh
InMemoryMailboxStore. - A buffered
Wakechannel for executor-driven re-entry. - The default coord system prompt (
DefaultCoordSystemPrompt). - The three default tools wired to the runner instance.
MaxWakeCycles = 100.
The factory does not start the runner's Run loop. Lifecycle is the caller's job: CLI consumers start the loop on a goroutine before RunFlow and let context cancellation tear it down; embedded consumers reuse their existing primary AgentRunner and pass it directly to WithCoordinator(primary). The orchestrator wires the runner's MessageRouter and Progress synchronously when New() returns, so the consumer can spawn the coord goroutine immediately afterward without racing the wiring.
Coord options
coord := zenflow.NewDefaultCoordRunner(
llm,
zenflow.SynthesizeOnly(), // drops `narrate`, only finalize fires
zenflow.WithCoordTools(myCustomTool), // append extra tools (cumulative across calls)
zenflow.WithCoordMaxWakeCycles(250), // raise the wake cap
zenflow.WithCoordSystemPrompt(myCustomPrompt), // replace the prompt entirely
zenflow.WithCoordSystemPromptSuffix("\nExtra:..."), // append to the default
zenflow.WithCoordContextProvider(func() string { // ambient context refreshed every wake
return ambientSnapshotForCoord()
}),
)WithCoordSystemPromptSuffix is the right tool for "keep the tested baseline, add a few project-specific lines". Replace the prompt entirely only if you have a strong reason - the default contains addressing rules that prevent unknown-step drops.
WithCoordContextProvider is the per-wake context hook. The callback fires once before the first LLM call and once on every wake-driven re-entry; its returned string is appended as a fresh user-role message wrapped in <dynamic-context>...</dynamic-context> so the coord LLM can distinguish ambient state from in-band conversation. Empty / whitespace-only returns are skipped. Use it for chat-driven UX that needs ambient context refreshed each wake (currently-open files, repo metadata, session topic) without re-engineering the system prompt.
Worked example: messaging-demo
name: messaging-rounds
agents:
asker:
description: "Curious user. Sends questions via send_message; reads answers from inbox."
expert:
description: "Knowledgeable expert. Reads questions; sends answers via send_message."
summarizer:
description: "Summarizes the Q/A history."
steps:
- id: asker-1
agent: asker
instructions: |
Round 1. Call send_message with "QUESTION_1: What is the capital of France?"
Then reply with EXACTLY: "SENT_1".
- id: expert-1
agent: expert
dependsOn: [asker-1]
instructions: |
Read the forwarded question from your inbox.
Call send_message with "ANSWER_1: <answer>"
Then reply with EXACTLY: "REPLIED_1".
# ... rounds 2 and 3 follow the same pattern, each depending on the previous expert.
- id: summary
agent: summarizer
dependsOn: [expert-3]
instructions: |
Read the full conversation. Write a 3-sentence summary.What the coordinator does, round by round:
asker-1starts. Coordinator wakes onEventStepStart. It narrates"asker started round 1"(or stays silent - either is fine).asker-1callssend_message. The send goes to the coordinator's inbox. The coordinator wakes, seesfrom=asker-1: QUESTION_1: ..., decides this needs to reachexpert-1. It callsforward_to_agent(target_step_id="expert-1", text="QUESTION_1: ..."). The MessageRouter pushes the message intoexpert-1's mailbox.asker-1finishes with content"SENT_1". Coordinator wakes onEventStepEnd, may narrate one sentence.expert-1starts. Its agent's first turn drains the inbox, sees the question, callssend_messagewith the answer.- Coordinator forwards the answer to
asker-2(the next round's step) once it starts. And so on. summaryfinishes, coordinator narrates a final line and may callfinalize(summary="3 rounds of Q&A on Paris, population, landmarks"). The workflow ends.
Two key facts in this workflow:
askerandexpertnever see each other directly. They only see the coordinator, who curates what reaches whose mailbox.- The
dependsOnchain (asker-1 -> expert-1 -> asker-2 -> ...) gives the coordinator time to forward each message before the next pair starts.
This is hub-and-spoke routing, enforced by the system. See Messaging for the full model.
Addressing rules
The coordinator's system prompt enforces step-ID-based addressing:
target_step_idis the step'sid:from the YAML, not the agent name.- Inner-DAG steps (loops, forEach, includes) use namespaced IDs:
- repeat-until iter N:
parentLoopID.N.innerStepID - forEach item N:
parentLoopID[N].innerStepID - include sub-workflow:
includeStepID.subStepID
- repeat-until iter N:
- Only address step IDs that have appeared in mailbox events. Inferring future step IDs from domain semantics is a frequent source of unknown-step drops.
The coordinator sees the right ID in every event payload (from= / step= fields). Mirror what you see.
Failure recovery
When forward_to_agent returns "dropped: ...", the tool result lists currently available step IDs. The coord's prompt instructs it to take one recovery action in the same turn:
- Retry
forward_to_agentwith a correct target ID. - Call
narrate(...)with the same content to surface it.
The system also preserves dropped content as fallback narration automatically, so user-facing output is not lost. Acting in the same turn produces cleaner output.
Termination
finalize is one-way. Once called, the coordinator runner stops processing events for the rest of the run. The default prompt cautions against premature finalize: the rule is "all declared steps have emitted EventStepEnd and no step is still pending in the mailbox".
Without finalize, the runner exits when the executor cancels its context (workflow done). The summary in WorkflowResult.Summary will be empty in that case; if you need a synthesis, install a coordinator and rely on finalize.
Cross-links
- Messaging - the routing model the coordinator participates in
- Observability - sinks that surface narration and synthesis events
- API: Options -
WithCoordinator,NewDefaultCoordRunner,CoordOption - Failure handling - how the coord sees step failures