Problem
By early 2026 the Zylver development environment was running 327+ specialized agents in production. Every developer commit triggered model calls for code review, classification, summarization, and routing. The bill grew faster than the headcount, and the cause was structural: the same prompts were being answered by frontier models thousands of times per day with effectively identical outputs.
Frontier inference is not free. At our run rate, repetitive calls were costing more than the actual novel work the team was shipping.
Approach
We applied the early-access build of Zylver Meter to our own infrastructure as a forcing function. Three steps:
- Instrument every call. Meter captured prompt, response, model, latency, and cost for every inference request, indexed by call site.
- Cluster the repeats. A clustering pass over the captured prompts identified 327 distinct call patterns with high similarity scores. Each pattern represented a frequently-repeated AI operation: classifying a commit message, summarizing a stack trace, routing a customer ticket.
- Distill the deterministic ones. For each pattern, Meter generated a deterministic function from the captured input/output examples. The function ran locally for a fraction of a cent, with the original model call as a fallback if confidence was low.
Result
After six weeks of staged rollout, 73 percent of the team’s AI operations were running through distilled functions instead of frontier models. Developer experience was unchanged: the same code paths, the same APIs, the same expected behavior. The bill simply went down.
The remaining 27 percent of operations (genuinely novel work, edge cases, low-confidence inputs) continued to hit frontier models. That is the work that should hit them. The savings came from eliminating waste, not capability.
What we would do differently
We waited too long to instrument. Three months earlier we already had the patterns, but no observability layer to surface them. Meter shipped specifically to remove that gap for customers, so they do not have to discover the problem the hard way.