Most multi-agent systems are sequential pipelines wearing a costume

A product team at a mid-size financial services firm evaluated two AI frameworks at the end of last year. They picked the one whose demo showed five agents collaborating on a research task: one fetched data, one cleaned it, one analyzed it, one cross-referenced sources, one wrote the summary. The framework called each step an agent. The internal pitch deck called the system “multi-agent.” Budget was approved on that basis.

They shipped in February. The common path worked. By April, three patterns of failure had become unignorable.

Any failure in the data-fetching step produced a 100% failure rate for the workflow, because no downstream step had a fallback. A new input type that required a different analytical approach produced confident, wrong summaries, because nothing in the system could recognize the mismatch and re-plan. Two of the five stages had no dependency on each other’s outputs but ran sequentially anyway, because the architecture had no concept of concurrent execution.

The signal that should have been visible from week one: any failure in step two produced 100% failure downstream. A real multi-agent system has at least one of (a) parallel paths, (b) re-planning, (c) graceful degradation. That system had none. It was a chain.

This is the default state of “multi-agent” in production right now. The vocabulary has gotten ahead of the architecture.

Why “multi-agent” stopped meaning anything

The term “multi-agent” used to describe a class of system with specific properties: independent agents with their own state and goals, communicating through a shared coordination layer, capable of dynamic role assignment under uncertainty. In 2026, the term routinely describes a sequential function-calling pipeline where each function has been given a persona prompt and a memorable name.

This post is the adversarial follow-up to our architecture patterns primer, which lays out four legitimate patterns: hierarchical orchestration, mesh coordination, pipeline, and star. Pipeline is one of the four. It has a real and useful place.

The problem is that the majority of systems labeled “multi-agent” in production are pattern three with a different label. The mismatch between description and reality produces an ugly second-order failure. When the team eventually needs the capabilities they implied they had, refactoring is harder than building from scratch.

The fix starts with diagnosis.

Three diagnostic questions

Apply these to the last “multi-agent” system your team built or evaluated. The answers resolve the label question.

Question 1: Do any two agents ever run at the same time?

Pull the execution trace for a typical request. At any point in the timeline, are two or more agents producing output concurrently? If yes, you have at least the structural prerequisite for a multi-agent system. If every step waits for the previous one to complete before it begins, you have a pipeline.

The number of steps does not change this. A ten-step sequential pipeline is not more agentic than a three-step one. Calling each step an “agent,” giving it a persona, or putting it in a framework that uses agent vocabulary in its API does not change the execution model. The trace tells the truth.

Concurrent execution is necessary but not sufficient. A pipeline that parallelizes two of its stages across multiple inputs is still a pipeline. Real concurrency in a multi-agent system means agents are operating on independent sub-problems with their own intermediate state, not the same stage running in parallel across a batch.

Question 2: Is the agent graph fixed at deploy time?

In a real multi-agent system, the set of agents invoked for a given task is determined at runtime, based on the task’s characteristics and intermediate results. If you can draw the full agent graph before you run a single request, and every input flows through the same stages in the same order, you have a workflow.

This matters because it determines adaptability. A hardcoded graph cannot reroute around a failed node. It cannot invoke a specialist agent that was not anticipated at design time. It cannot decompose a novel input type differently from the template the system was built for.

Dynamic role assignment requires a capability registry and a planner. The capability registry tells the system what each agent can do. The planner decides which agents to invoke and in what order. Most systems labeled multi-agent have neither. What looks like dynamic routing turns out, on inspection, to be a hardcoded conditional with agent vocabulary on top.

Question 3: Do the agents have independent state and the ability to refuse?

An agent is not a function that accepts a prompt and returns a string. An agent has state that persists across a task. It has a goal that is its own. It has the ability to escalate, decline a request outside its scope, or ask for clarification before acting.

If your “agents” are stateless prompt templates that always accept the input and always produce an output, they are functions. They may be excellent functions. But they are not agents in any architecturally meaningful sense.

A system that fails all three questions is a sequential pipeline with role-play prompts.

The three questions compound. Most production systems pass one and fail two. A small number pass all three. Those are the real multi-agent systems.

What costumed pipelines fail at

Pipelines fail in predictable ways under conditions they were not tuned for. Four failure modes matter.

Branch reactivity. A pipeline cannot route around a failed step. If stage three in a five-stage pipeline produces unusable output, the system has two options: fail the whole request, or pass bad data to stage four. There is no third path. A genuine multi-agent system can recognize the failure, invoke a different specialist, and continue with a different decomposition.

Long-tail input handling. Pipelines are tuned for the input distribution they were built against. The common path works. The long tail of inputs that need a different decomposition fails silently. They produce confidently wrong answers, because every stage does its best with the input it received, even when the upstream input was already corrupted. In our deployments, this tail is typically 8-12% of traffic. The exact number is less interesting than the fact that the pipeline cannot tell you which requests fell into it.

Cost asymmetry. Every stage in a sequential pipeline runs at the cost ceiling of the most expensive stage, in the sense that you cannot parallelize cheap stages with expensive ones to reduce wall-clock cost. More importantly, every input pays for every stage. A genuine multi-agent system can route easy cases to cheaper specialists. The architecture-patterns primer notes that 60-70% of agent tasks in typical systems can run on smaller models without quality loss. Pipelines capture almost none of that savings.

Serialized parallel work. Some tasks have independent sub-problems that should run concurrently. A costumed pipeline serializes them. A five-minute sequential workflow may have three minutes of work that could run in parallel. At scale, that latency compounds into user-visible degradation and avoidable infrastructure cost.

Why teams ship costumed pipelines anyway

This is not a story about bad engineering. Shipping a pipeline when the task calls for a pipeline is the right move. The failure is the labeling, not the architecture. Four pressures explain why teams reach for the wrong vocabulary.

Pipelines are deterministic and easy to test. Unit tests per stage, integration tests for the chain, behavioral reasoning without a probabilistic coordination layer. Multi-agent coordination introduces non-determinism that requires a different testing discipline. That discipline exists, but it is genuinely harder to build and harder to onboard a new engineer onto.

Frameworks make pipelines look like agents. Most popular agentic frameworks use agent vocabulary throughout their documentation and APIs. A developer who follows the getting-started guide builds a sequential workflow and reasonably concludes they built a multi-agent system, because the framework said so. The vocabulary problem starts before the team writes a line of code.

Demos and benchmarks reward the appearance of agency. A diagram with five named roles outperforms “here is our five-step sequential workflow” in every funding meeting and every conference talk. The incentive is to reach for the vocabulary, not the architecture. The vocabulary is downstream-cheap and upstream-expensive: cheap to adopt, expensive when reality forces a reckoning.

Multi-agent coordination is hard to debug without proper instrumentation. This is an honest engineering constraint. Without the four-layer telemetry stack per agent (token, quality, behavior, outcome), a real multi-agent system is harder to operate than a pipeline. Many teams correctly conclude they cannot operate one yet. Some draw the wrong conclusion from that and reach for the agent vocabulary anyway.

When pipeline is the right call (and call it that)

Three conditions make a pipeline the correct architecture: the task decomposition is stable and unlikely to change, the input distribution is narrow and well-characterized, and sequential execution latency is acceptable for the use case. In those conditions, a pipeline beats a multi-agent system on every axis that matters: simpler to build, simpler to test, simpler to operate, simpler to onboard, simpler to explain.

A well-documented pipeline is a professional deliverable. A mislabeled pipeline is a future refactor with a political problem attached.

The mistake is not building pipelines. The mistake is describing them as multi-agent systems. To internal stakeholders who will later fund a migration they did not know was necessary. To customers who will hold you to capabilities you do not have. To your own team, who will design downstream systems against an architecture description that is not accurate.

What a real multi-agent system actually requires

If diagnosis points the other way, four concrete capabilities separate the real architecture from the costumed version. These map directly to the implementation details in the primer.

A shared coordination substrate. Agents need to communicate intermediate state without being wired sequentially to each other. A message bus, a shared blackboard, or a structured event log. Without it, every agent-to-agent interaction is a direct sequential call, which is the definition of a pipeline.

A capability registry. For dynamic role assignment, agents must be able to discover what other agents can do. A hardcoded routing table is a start. A registry with confidence scores, load state, and query-time selection is the production version. The cost of skipping this is that your “dynamic” routing turns out to be a switch statement.

Convergence control. Depth limits. Request deduplication. Global timeouts. Escape valves. The mesh coordination section of the primer covers this in depth. Without all four, concurrent agents enter loops, duplicate work, or produce conflicting outputs that the system cannot reconcile.

Independent observability per agent. The four-layer stack applies at the individual agent level, not just at the workflow level. Token and cost per agent. Quality scores per agent. Behavior signals per agent. Outcome attribution that connects user-visible signals back to the specific agents responsible. Without this, the system is a black box that you cannot debug, and you will eventually fall back to running it as a pipeline by hand because the pipeline behavior is the only one you can reason about.

This is the operating model behind Zylver Agents, where the constraint that any agent must be independently observable, independently swappable, and independently constrained shaped the entire architecture. The principles transfer regardless of how you build it.

Where to start tomorrow

If you are running something you have been calling multi-agent, work through this checklist this week.

Pull one production trace. Find any two agents that ran at the same time. If you cannot, you have your answer to question 1.
Try to draw your agent graph for a hypothetical novel input. If the graph is the same as for any other input, you have your answer to question 2.
Check whether any agent in your system can refuse. Search the code for an explicit decline path. If every agent always returns an output, you have your answer to question 3.
Pick a name that matches. If the answers are no, no, no, change the internal description from “multi-agent system” to “agentic pipeline” or “AI workflow.” Document it. Tell stakeholders.
Decide whether the architecture should change. Sometimes the answer is yes, and the upgrade path is real. Sometimes the answer is no, and the pipeline is the right deliverable.

The takeaway

The trace does not lie. If your “multi-agent” system serializes everything, it is a pipeline. Calling it the right name is the first refactor.

Beyond chatbots: multi-agent architecture patterns for production. The primer this post is the adversarial counterpart to. Read it for the four legitimate architecture patterns and when each applies.
What to instrument when your AI degrades in production. The telemetry stack that makes real multi-agent systems debuggable. Without it, you have to run the pipeline.

Most multi-agent systems are sequential pipelines wearing a costume

Why “multi-agent” stopped meaning anything

Three diagnostic questions

Question 1: Do any two agents ever run at the same time?

Question 2: Is the agent graph fixed at deploy time?

Question 3: Do the agents have independent state and the ability to refuse?

What costumed pipelines fail at

Why teams ship costumed pipelines anyway

When pipeline is the right call (and call it that)

What a real multi-agent system actually requires

Where to start tomorrow

The takeaway

More from Zylver

Reading an LLM bill: line items that actually matter

Multi-tenant AI: what you can't fake when you have 50 customers

Financial services AI: four constraints that reshape the architecture

Why “multi-agent” stopped meaning anything

Three diagnostic questions

Question 1: Do any two agents ever run at the same time?

Question 2: Is the agent graph fixed at deploy time?

Question 3: Do the agents have independent state and the ability to refuse?

What costumed pipelines fail at

Why teams ship costumed pipelines anyway

When pipeline is the right call (and call it that)

What a real multi-agent system actually requires

Where to start tomorrow

The takeaway

Related reading

More from Zylver

Reading an LLM bill: line items that actually matter

Multi-tenant AI: what you can't fake when you have 50 customers

Financial services AI: four constraints that reshape the architecture