Skip to content

Messaging

Step agents in zenflow can talk to each other, but never directly. Every message goes through the coordinator. This is hub-and-spoke routing: agents are spokes, the coordinator is the hub.

This page describes the routing model. For the coordinator's role, see Coordinator.

Why hub-and-spoke

Three reasons:

  1. Auditability. Every cross-agent message touches the coordinator, who can narrate, log, or veto it. There is one place to look for "what did agents tell each other".
  2. Schema-free coordination. Agents do not need to know each other's interfaces. They send free-form text to the coordinator, who routes it to whoever needs it.
  3. No N-by-N permission matrix. With direct peer-to-peer, every pair of agents would need a permission rule. Hub-and-spoke needs one rule: "can this step send to the coordinator? (yes)" plus "can the coordinator forward to this step? (yes if registered)".

There is no peer-to-peer messaging in zenflow. Step A cannot directly call send_message(to=stepB, ...). The send_message tool sends to the coordinator, full stop. The coordinator decides whether to forward to step B via forward_to_agent.

The two tools

ToolCallerEffect
send_message(text)Step agentsPushes a RouterMessage into the coordinator's inbox. The coordinator wakes and processes it.
forward_to_agent(target_step_id, text, kind?)CoordinatorPushes a RouterMessage into a step's inbox. The step's next agent turn drains the inbox into its conversation context.

send_message is auto-injected on every step runner that has a MessageRouter AND is not the coordinator itself (detection: presence of forward_to_agent in the runner's tool list marks the coordinator). MessageRouter is the public alias for the internal router.Router. Use MessageRouter in user code. Step runners that already have a send_message tool keep their own - no overwrite. forward_to_agent is one of the three default coordinator tools (alongside narrate and finalize).

There is no direct way for the coordinator to address a step that has not been registered with the MessageRouter. The MessageRouter rejects sends to unknown step IDs with DropReasonUnknownStep and emits an EventMessageDropped event.

Message flow

A typical round trip:

1step A's agent calls send_message("for B")enqueue2MessageRouter.Send → coord inbox3coord mailbox appendsRouterMessage{From: stepA, To: coord}wake4wake signal → runner loop drains inbox5coord LLM decides: this is for B6forward_to_agent("stepB", "for B")7MessageRouter.Send → stepB inbox8stepB mailbox appendsRouterMessage{From: coord, To: stepB}drain9stepB's next turn drains, injects into LLM context
Reverse path (B answers): step B calls send_message("answer") -> coord inbox -> coord LLM decides to forward to A -> forward_to_agent("stepA", "answer").

Reverse path (B answers): Step B calls send_message("answer") -> coordinator inbox -> coordinator decides to forward to A -> forward_to_agent("stepA", "answer").

Addressing rules

The target_step_id argument to forward_to_agent is the step's id: from the YAML, not the agent name. A step with id: list_services and agent: discovery is addressed as forward_to_agent("list_services", ...), not forward_to_agent("discovery", ...).

Namespaced IDs in loops, forEach, and includes

When a step lives inside a loop, forEach, or included sub-workflow, its runtime ID is namespaced. The MessageRouter accepts either the bare step name (worker) or the namespaced runtime ID (loop-stages.0.worker); root router delegation routes both to the correct mailbox. Use whichever form the coordinator's prompt naturally produces.

ContainerNamespace patternExample
repeat-until iteration NparentLoopID.N.innerStepIDloop-stages.0.worker
forEach iteration NparentLoopID[N].innerStepIDdeploy[0].deploy_step
include sub-workflowincludeStepID.subStepIDdeploy-staging.run-tests

The events the coordinator receives carry these namespaced IDs in their from= and step= fields. The coordinator's system prompt instructs it to mirror whatever it sees. Don't construct namespaced IDs by hand. Use the step IDs that arrive in events; the MessageRouter resolves both bare and namespaced forms.

Why this is enforced

If the coordinator addresses a step that does not exist (forward_to_agent("future-step")), the MessageRouter cannot deliver. The drop is surfaced as EventMessageDropped{reason: "unknown-step"} and the tool returns "dropped: unknown step. Available: [list of registered IDs]". The default coord prompt instructs it to recover in the same turn (retry with a correct ID, or call narrate(...) with the same content).

Event types

The progress sink sees the routing as a sequence of events:

EventFires on
EventStepStartA step's agent began execution. The runner is now registered with the MessageRouter.
EventStepEndA step terminated successfully (status=completed). Its mailbox closes after this. Failed steps fire EventError; skipped steps fire EventStepSkipped.
EventMessageSentA send_message or forward_to_agent call succeeded (queued).
EventAgentInboxDrainA step agent drained one RouterMessage into its LLM context.
EventCoordinatorInboxMessageThe coordinator drained one RouterMessage from its mailbox.
EventCoordinatorNarrationThe coordinator called narrate.
EventCoordinatorMessageThe coordinator pushed a targeted message via forward_to_agent.
EventCoordinatorSynthesisThe coordinator called finalize.
EventMessageDroppedA send was rejected. Data["reason"] is one of the DropReason strings.

EventMessageSent is the outbound side; EventAgentInboxDrain and EventCoordinatorInboxMessage are the inbound side. They are emitted independently because the gap between send and drain can be large (e.g. a step is busy with an LLM turn when a forward arrives - the forward sits in the mailbox until the next turn).

RouterMessage shape

go
type RouterMessage struct {
    MessageID string
    From      string
    To        string
    Type      RouterMessageType
    Content   string
    // ... timestamps and other metadata
}

From is the sender's step ID (or "coordinator"). To is the recipient's step ID. Type is one of:

  • RouterMessageInfo - general informational message (the default). Used by the coordinator's forward_to_agent to deliver informational text to a target step's mailbox.
  • RouterMessageCancel - requests the receiving agent to stop.
  • RouterMessageContextUpdate - injects new context into the agent's conversation. Used by the coordinator's forward_to_agent to push context updates to a running step.
  • RouterMessageResumeReply - the reverse-routed reply produced after a resumed step finishes. Tagged distinctly so observers can distinguish resume responses from regular coordinator pushes; the drain logic treats it the same as RouterMessageInfo (appended as a user turn).

Drop reasons

Every drop emits exactly one EventMessageDropped. The DropReason enum names every possible reason; see Failure handling for the full table.

The two most common drops in messaging:

  • unknown-step - the target step ID was never registered. Usually a coordinator addressing mistake (typo, namespace mismatch, future step).
  • target-terminal - the target step's mailbox is closed because the step finished. Forwarded messages to a step that already terminated drop here. The resume mechanism can rescue some of these - see Resume.

Inbox draining

Step agents drain their inbox at the start of every LLM turn. The drain prepends the queued messages to the conversation as user-role content with a header naming the sender. The agent's next response can refer to the messages as if they were normal user input. The orchestrator caps each mailbox at DefaultMaxMailboxSize (10000) messages by default; pass WithMaxMailboxSize(0) to opt out of bounding.

The coordinator drains its inbox under wake-driven control: every push triggers a wake signal, the runner loop drains all pending messages and asks the LLM for a response. See Coordinator for the wake cycle details.

Reverse-routing for resumed steps

When a step terminates and a later message arrives addressed to it, the MessageRouter can ask the executor to resume the step (via the resume mechanism). The resumed step's response routes back to the original sender via the coordinator's inbox as EventCoordinatorInboxMessage. The coordinator can surface the reply via narrate, forward it elsewhere, or ignore it.

The mechanism is described as API surface only - see Resume.

What messaging is not for

  • Bulk data passing. Use step outputs (content, result) and dependsOn. The output injection path is automatic and respects truncation caps. Sending a 50-page document via forward_to_agent works but is wasteful.
  • Inter-process IPC. Messaging is in-process only. The default InMemoryMailboxStore does not survive a process restart. For multi-process flows, plug a persistent MailboxStore via WithMailboxStore, but you still need an external coordinator to bridge processes.
  • Authentication or capabilities. Anyone running in the workflow can send to the coordinator and the coordinator can forward to any registered step. Permission gates live at the tool level (WithPermissions), not the messaging layer.

Released under the Apache 2.0 License.