The hidden cost of context switching in AI workflows

The promise of multi-step AI workflows is that you can chain specialized capabilities together: one model extracts, another classifies, a third synthesizes. Each step does what it does best. The pipeline produces better results than any single step could.

This is true in the same way that a long relay race is faster than one runner completing the distance alone: it depends entirely on the quality of the handoffs. In relay racing, a bad handoff loses fractions of a second. In AI workflows, a bad handoff between steps loses information, introduces latency, and compounds error. In complex pipelines, the cumulative cost of the handoffs often exceeds the cost of the individual steps.

Most teams that build multi-step AI workflows measure the performance of each step individually. They do not measure the cost of the boundaries between steps. This is where the hidden cost lives.

What gets lost at boundaries

When one step in an AI workflow passes its output to the next step, something is always lost. The question is how much.

The first loss is representational compression. Step one produces a result that is richer than what gets passed to step two. A document retrieval step returns ranked passages with confidence scores, provenance information, and surrounding context. The synthesis step receives a concatenated string. The ranking information is gone. The provenance is gone. The surrounding context that did not make the selection threshold is gone. The synthesis step makes decisions based on a compressed representation of what step one actually knew.

The second loss is semantic fidelity. Language is not a lossless encoding format. When step one produces a result and it gets serialized into text to pass to step two, some meaning is lost in the serialization. This is especially acute when the output of one step is structured (a classification, a set of extracted entities, a score) and that structure has to be re-expressed in natural language for the next step to use. “The sentiment is negative with high confidence” loses information compared to passing {sentiment: "negative", confidence: 0.94}.

The third loss is error amplification. A mistake in step one becomes an assumption in step two. Step two does not know that step one made a mistake; it treats the output as ground truth. If step two also makes a mistake building on that incorrect foundation, step three is now compounding two errors it cannot see. In a five-step pipeline, a mistake in step one can propagate through every subsequent step without any of them having enough information to detect it.

Where the latency hides

Multi-step AI workflows have a latency profile that is easy to underestimate. Each step adds its own latency, and the steps are often sequential because each step depends on the previous one’s output. A five-step pipeline where each step takes two seconds does not take two seconds. It takes at least ten seconds, plus the overhead of serialization, passing, and deserialization at each boundary.

In practice, the overhead at boundaries is not negligible. If each step passes its output through an API call (to another model, to a retrieval service, to a validation layer), the network roundtrip time at each boundary adds to the cumulative latency. A workflow with five steps and four boundaries can easily add two to four seconds of pure overhead that has nothing to do with the compute time of any individual step.

This matters for user-facing applications where latency is perceived quality. A workflow that takes twelve seconds feels slower than a single-step process that takes eight seconds, even if the multi-step workflow produces better output. The quality gain has to be significant enough to justify the latency cost, and most teams do not measure whether it is.

The cost accounting problem

In a single-step AI system, the cost is straightforward: you pay for input tokens plus output tokens per call. In a multi-step workflow, the cost calculation is more complex because the output of one step becomes part of the input to the next step, and that output can be large.

Consider a workflow where step one extracts key information from a long document (producing 500 tokens of output), step two classifies and ranks the extractions (receiving 500 tokens as input, producing 200 tokens), and step three synthesizes a response (receiving 700 tokens as input, producing 300 tokens). The total token cost is the sum across all steps: input to step one plus all subsequent inputs and outputs. The intermediate outputs are real tokens that cost real money, even though they are invisible to the end user.

Teams that build multi-step workflows without tracking intermediate token consumption routinely underestimate their per-request cost by a factor of two to three. The cost shows up in the aggregate billing, but it is not attributed to the workflow that generated it, so the team cannot see that the pipeline is operating at lower margin than a simpler approach would.

Three design decisions that reduce boundary cost

Keep intermediate representations structured. When passing information between steps, prefer structured formats over natural language summaries. A JSON object with typed fields preserves more information than a prose description and is less susceptible to misinterpretation by the next step. Where the next step requires natural language input, generate the natural language from the structured representation at the boundary, not before it. This preserves the richness of the intermediate representation until the last moment.

Make boundaries explicit and testable. Treat each boundary between workflow steps as an interface, not an implementation detail. Define what the upstream step should produce and what the downstream step expects. Test that the upstream output satisfies the downstream expectation on representative inputs. This is the multi-step workflow equivalent of type-checking: it catches mismatches at development time instead of discovering them as degraded outputs in production.

Measure end-to-end quality, not per-step quality. The goal of a multi-step workflow is end-to-end output quality. Optimizing each step individually does not guarantee that the overall output is good, because the inter-step losses are not captured in per-step metrics. Run evaluations on the full pipeline output and trace quality regressions back through the steps to identify where the information loss occurred. A step that performs well in isolation may be the source of a significant quality problem when it operates as part of a chain.

When the single-step approach is actually better

The case for multi-step AI workflows is genuine. Specialized steps can perform better than a single generalist step on complex tasks. Pipeline architectures are easier to debug and improve incrementally. But the case has limits.

When the task does not require specialization, the overhead of the pipeline exceeds its benefit. A simple question-answer use case that is routed through a retrieval step, a reranking step, and a synthesis step for consistency with the architecture carries the latency and cost of three steps without needing the separation. A single step would produce equivalent quality faster and cheaper.

When the inputs are short and the context can fit in a single model call, passing the full context directly is usually better than extracting and summarizing it across steps. The compression loss at the boundary costs more than the gain from the specialized processing.

The right question before adding a step to a workflow is not “could a specialized step do this better?” It is “does the improvement from specialization exceed the cost of the additional boundary?” That cost includes latency, token overhead, the compression loss at serialization, and the error amplification from treating the upstream output as ground truth. Most teams answer the first question and skip the second.

The teams that build effective multi-step AI workflows are the ones that treat the boundaries as the primary design problem, not an implementation detail. The steps are the parts you can see. The boundaries are where the system actually lives.

The hidden cost of context switching in AI workflows

What gets lost at boundaries

Where the latency hides

The cost accounting problem

Three design decisions that reduce boundary cost

When the single-step approach is actually better

More from Zylver

What your board needs to know about AI

How AI is changing customer service

How to scale AI adoption from one team to the whole organization