Skip to main content
Forge AI Engineering Zylver Engineering (internal)

Coordinating 327 specialized agents in one production environment

How Zylver Forge orchestrates a hierarchical multi-agent system in our daily development environment, with full observability, cost caps, and policy hooks per agent role.

327+

Specialized agents in production

100%

Operations instrumented

11

Distinct agent topologies live

<200ms

p95 router decision latency

Problem

A single agent cannot do real work. Real work requires coordination: a planner that decomposes the request, a router that picks the right specialist, several specialists that produce candidate output, a reviewer that catches errors, and a distiller that compresses successful patterns into reusable functions.

Naive coordination breaks down past five or six agents. Latency compounds, error budgets shrink, observability gets impossible, and cost spirals because every layer of the hierarchy multiplies the call count.

The Zylver development environment needed all of those agents, plus more, working together every day. We needed coordination that scaled.

Approach

Zylver Forge enforces three invariants on every multi-agent topology it runs:

  1. Typed contracts between agents. Each agent declares input and output schemas. The router refuses to dispatch if the contract does not match. This caught dozens of subtle integration bugs that would have surfaced as nondeterministic failures in production.
  2. Cost caps per role. Every agent role has a per-request and per-day cost cap. The platform refuses calls that would breach the cap and routes to cheaper alternatives or fails fast with a structured error.
  3. Policy hooks at every boundary. Authorization, redaction, retention, and audit policies fire automatically at agent boundaries. The agent author does not write the policy code; they declare which policies apply.

With those invariants in place, the topology can grow without the coordination cost growing linearly with it. New agents drop into existing flows by declaring their contract.

Result

The current Zylver development environment runs 327+ specialized agents across 11 distinct topologies (hierarchical for planning, mesh for parallel research, pipeline for code review, star for triage). Every operation is observable in real time. Router p95 latency stays under 200ms even as the agent population grows.

The platform refused to ship any agent we wrote that did not pass the contract gate. That refusal was not friction; it was the system protecting itself.

What we would do differently

The first version of the contract gate was a soft warning. Developers ignored it. We hard-failed the next version and the integration bugs disappeared in two weeks. The lesson generalized: in multi-agent systems, the invariants that are not enforced are not invariants.

Want this for your team?

Forge is in staged early access. Cohort members get paired implementation and pricing locked for the first billing year.