Per-tenant AI cost attribution: why aggregate dashboards are not enough
Aggregate AI spend tells you what you're paying. Per-tenant attribution tells you who's driving it, what you can charge for it, and which tenants are profitable. Building it is harder than it looks.
By Ramiro Enriquez
Most AI cost dashboards show you a number. It goes up when usage goes up, and down when it does not. If your product serves one customer, that is enough information. If your product serves fifty, or five hundred, the aggregate number tells you almost nothing useful.
The question that matters is not “how much did we spend on AI this month?” It is “which tenants drove that spend, is that proportional to what we charge them, and are we profitable at the per-customer level?” Aggregate dashboards cannot answer those questions. Per-tenant attribution can.
Why aggregate spend is the wrong unit
When an AI feature launches and costs spike, you need to know where the spike came from. Aggregate dashboards tell you there was a spike. They do not tell you whether it was one tenant who ran a 10,000-token query loop at 2 AM, a distributed increase across all tenants, or a bug in one customer’s integration. Each of those scenarios has a different response: direct outreach to the tenant, platform pricing recalibration, or an emergency fix.
Without per-tenant attribution, you are debugging a production system without stack traces. You can see the symptom. You cannot see the cause.
This gets more acute as the platform scales. The variance in AI consumption across tenants in a typical SaaS product is much higher than variance in traditional usage metrics. A tenant who uses the AI summarization feature 50 times a day costs your platform orders of magnitude more than one who never uses it. If both are on the same pricing tier, the platform is structurally subsidizing the heavy users at the expense of the light ones. You cannot fix that without knowing who is who.
What attribution actually requires
Per-tenant AI cost attribution sounds like an instrumentation problem. In practice it is an architecture problem with an instrumentation layer on top.
The core requirement is that every AI call must carry a tenant identifier through to the billing ledger. This sounds obvious, but in most AI system architectures it is not the default. The typical pattern is that an application service calls an AI provider, the provider charges the API key, and the charge shows up in the provider’s dashboard aggregated by model and time window. The tenant context that existed in the application layer is lost by the time it reaches the billing layer.
Recovering it requires a callsite convention: every AI call must tag the request with a tenant identifier, and something downstream must read that tag and record it against the tenant’s ledger. The tag can be a custom header, a field in the request metadata, or a middleware convention. What matters is that it exists, it is consistent, and it is enforced at the framework level rather than left to individual developers to remember.
The common failure mode is partial implementation: the tagging convention exists, but three of seventeen callsites do not follow it. Untagged calls go to an “unknown” bucket. As the platform grows, the unknown bucket grows with it, and the per-tenant numbers become untrustworthy. Teams that encounter this tend to conclude that attribution is too hard, when the actual problem is that the enforcement layer is missing.
The ledger design decision
Once you have tagged requests, you need to decide where and how to record the cost data. The two main approaches have different tradeoffs.
Provider-side cost forwarding uses provider APIs (where they exist) to pull cost data and match it to tenant identifiers. The advantage is that the cost data is authoritative: it comes from the same place as the invoice. The disadvantage is latency (some providers report costs with 24-48 hour delays), coverage (not every provider offers per-request cost APIs), and the coupling between your cost model and the provider’s pricing structure. When the provider updates pricing, your cost model updates with it, which can be what you want or not.
Application-side cost estimation records token counts at the callsite, applies a stored rate table, and writes the result to a ledger in real time. The advantage is that the data is available immediately, you control the rate model, and the system works across any number of providers without waiting for their API to support cost export. The disadvantage is that your rate table can fall out of sync with actual provider pricing, so the estimated costs diverge from invoice costs. This is manageable with a regular reconciliation step, but it adds operational overhead.
Most multi-tenant platforms that do this well use application-side estimation for real-time operational dashboards and provider-side actuals for end-of-period billing reconciliation. The real-time data drives decisions (alerting on a tenant who is running hot, surfacing usage in-product to the tenant, identifying optimization opportunities). The period-end reconciliation drives invoicing.
What you can do with the data
Attribution is not a cost-center project. It is a revenue and product project with a cost-center input.
Pricing calibration. When you know the cost per tenant, you can calculate per-tenant margin. Most multi-tenant AI platforms discover significant dispersion: some tenants are highly profitable, others are subsidized by the rest. The profitable tenants are candidates for expansion offers. The subsidized ones need either a pricing conversation or a usage cap. You cannot have either conversation without attribution data.
In-product usage transparency. Tenants who see their AI usage often moderate it. Not because they have to, but because the data makes the consumption visible. Showing a tenant a real-time view of their token usage, broken down by feature, changes the conversation from “why is my bill high?” to “we used more last month because we ran the batch export three times.” Transparency shifts the narrative from a dispute to a usage discussion.
Optimization targeting. When you see that one tenant’s summarization pipeline costs 3x the platform average, you investigate. Sometimes the cause is legitimate heavy usage. Sometimes it is a prompt that is 40% longer than necessary, or a retrieval step that returns 200 documents when 10 would do. Without attribution, you look at average costs across all tenants and the outlier is invisible. With attribution, the outlier surfaces immediately.
Budget enforcement. Some tenant contracts include AI usage limits or soft caps. Attribution data makes those enforceable in real time rather than as an end-of-month reconciliation. When a tenant approaches their limit, the platform can alert them or throttle gracefully rather than discovering the overage on the invoice.
The implementation sequence that works
The mistake most teams make is trying to build the full attribution system before launching the AI feature. The full system takes time, and the cost of not launching is real. The minimal attribution system takes a day and captures enough data to make the rest of the investment worthwhile.
Start with consistent tagging. Add a tenant identifier to every AI callsite. If you have a middleware layer, enforce it there. If you do not, add the tag manually and write a linter rule that flags callsites without it. This step alone transforms the unknown bucket from “most of your spend” to “zero.”
Add a write to a cost ledger table. One row per AI call: tenant ID, model, input tokens, output tokens, timestamp, estimated cost. This is one database write per AI call. At moderate volume, it is operationally negligible. The schema is small and stable.
Build the read layer later. Dashboards, alerts, and in-product reporting can come after the data exists. They cannot come before. Teams that build dashboards before data collection have beautiful dashboards with nothing to show.
The retrospective analysis is what pays back the investment. Once the ledger exists, you can run queries that would have been impossible without it. Which tenants drove 80% of AI spend last quarter? Which features have the highest cost-per-use? Which model upgrades increased costs without proportionally improving the metric you care about? These questions have answers in the ledger. They are invisible in the aggregate.
Attribution infrastructure is boring to build and powerful to have. The teams that skip it spend years making pricing decisions based on averages that hide the underlying distribution. The ones that build it early spend those years optimizing for actual margin at the customer level. The difference compounds.
Zylver ships AI products: Forge, Signal, Agents, Flows, and Meter. View all products.
More from Zylver
What your board needs to know about AI
Boards are being asked to provide oversight on AI at a moment when most board members lack the background to evaluate what they are hearing. The gap between what boards need to know and what they typically get in management presentations is real and consequential.
How AI is changing customer service
Customer service is one of the business functions most visibly transformed by AI. The changes are happening faster than most organizations planned for, and the outcomes depend heavily on implementation decisions that are easy to get wrong.
How to scale AI adoption from one team to the whole organization
Getting AI to work in one team is a different challenge from scaling it across an organization. What worked for the first team often fails when applied elsewhere, and the failure mode is usually invisible until the expansion is already stalled.
Get insights like this delivered monthly.
No spam. Unsubscribe anytime.