Skip to main content
Meter AI Engineering Zylver Engineering (internal)

Cutting our own AI bill 73% with intelligent distillation

How Zylver Meter applied to our internal AI usage identified 327 high-frequency prompts and converted them into deterministic functions, eliminating 73% of repeated inference spend without changing developer ergonomics.

73%

Reduction in monthly AI spend

327

Prompts distilled to zero-cost

0

Developer workflow changes

6 weeks

From baseline to savings

Problem

By early 2026 the Zylver development environment was running 327+ specialized agents in production. Every developer commit triggered model calls for code review, classification, summarization, and routing. The bill grew faster than the headcount, and the cause was structural: the same prompts were being answered by frontier models thousands of times per day with effectively identical outputs.

Frontier inference is not free. At our run rate, repetitive calls were costing more than the actual novel work the team was shipping.

Approach

We applied the early-access build of Zylver Meter to our own infrastructure as a forcing function. Three steps:

  1. Instrument every call. Meter captured prompt, response, model, latency, and cost for every inference request, indexed by call site.
  2. Cluster the repeats. A clustering pass over the captured prompts identified 327 distinct call patterns with high similarity scores. Each pattern represented a frequently-repeated AI operation: classifying a commit message, summarizing a stack trace, routing a customer ticket.
  3. Distill the deterministic ones. For each pattern, Meter generated a deterministic function from the captured input/output examples. The function ran locally for a fraction of a cent, with the original model call as a fallback if confidence was low.

Result

After six weeks of staged rollout, 73 percent of the team’s AI operations were running through distilled functions instead of frontier models. Developer experience was unchanged: the same code paths, the same APIs, the same expected behavior. The bill simply went down.

The remaining 27 percent of operations (genuinely novel work, edge cases, low-confidence inputs) continued to hit frontier models. That is the work that should hit them. The savings came from eliminating waste, not capability.

What we would do differently

We waited too long to instrument. Three months earlier we already had the patterns, but no observability layer to surface them. Meter shipped specifically to remove that gap for customers, so they do not have to discover the problem the hard way.

Want this for your team?

Meter is in staged early access. Cohort members get paired implementation and pricing locked for the first billing year.