The MCP Cost Ledger: FinOps Billing for 47 AI Agents Without a Tag Schema

The 47th agent is when finance shows up. Below 30 agents in production, the Anthropic invoice is one tolerable line item somewhere south of $25,000 a month, and nobody asks who is spending what. Past 30, the line item crosses $25k. By 47, the median fleet I see at ZopDev customers is at $80,000 a month, and finance walks into the next budget review with a single question: which agent.

Nobody can answer it. The cost dashboard buckets spend by AWS account, by team, by service, by tag. None of those buckets contain agents, because agents do not consume resources tagged with their identity. They consume tokens, charged on a single Anthropic or OpenAI account, with the per-call invoice opaque to the cloud cost system. Resource tags catch infrastructure. They miss the LLM bill entirely.

The fix is a cost ledger: a five-field row per LLM call that the agent runtime stamps on the way out, paired with the existing MCP tool-call audit log. With the ledger in place, any allocation question (per-agent, per-team, per-feature, per-customer) becomes a single SQL query. Without it, the agent fleet is one line item the finance director cannot disaggregate.

This post is the schema, the two-axis attribution model, and the closed-loop pattern the ledger composes with. It builds on agentic AI FinOps and the LLM FinOps per-feature token budget work without re-architecting either.

The 47th agent and the panic finance review

Most agent fleets follow the same growth curve. The first ten agents are pilot projects sitting on individual engineers’ API keys; nobody tracks them centrally. Agents eleven through thirty get a shared budget and some loose central oversight. The sum is $5k to $25k a month and the CFO does not bother.

Fleet size	Typical monthly Anthropic spend	Finance attention
1-10 agents	$200 to $4,000	None; engineering credit cards
11-30 agents	$4,000 to $25,000	Quarterly review, line-item only
31-50 agents	$25,000 to $80,000	Monthly review, demand allocation
51+ agents	$80,000+	Per-agent budgets, hard caps, weekly review

Crossing 30 agents is the inflection. The line item is now a top-15 cloud cost driver, finance asks for the breakdown, and engineering does not have one. The retrofit takes six weeks of archaeological work across Anthropic invoices, OpenTelemetry spans, and Slack threads. Teams that instrumented per-agent attribution before crossing 30 skip the panic month entirely.

Why tags do not work for agents

Cloud cost allocation is built around the AWS Cost and Usage Report. Resources have tags. Tags get pulled into CUR. CUR feeds the dashboard. The dashboard buckets by tag. This works because cloud resources have a one-to-one mapping between identity and consumption.

Agents break that mapping in three ways.

Cardinality is wrong. A single AWS account hosting 47 agents has hundreds of cloud resources, each tagged once. The agents share infrastructure (the same Lambda, the same Postgres, the same MCP servers), so resource tags cannot identify which agent triggered a given query. The infrastructure tag says “shared-llm-runtime”; the cost system knows nothing else.

Source of truth is wrong. Token spend lives at Anthropic, not AWS. The Anthropic invoice arrives once a month, aggregated to the account level. CUR has no view of it. Even if every agent used a separate API key (a configuration nightmare that no real fleet runs), the keys would not flow into the cloud cost dashboard without a custom integration.

Aggregation key is wrong. Tags are static; agents are dynamic. A single agent today might be the fraud-classifier, tomorrow it might be re-purposed as the chargeback-reviewer. The agent_id is the right grain, and the tag schema cannot hold dynamic identities without sprawl.

The fix is not better tags. The fix is a parallel ledger that sits next to CUR, owned by the agent runtime, with its own schema and aggregation rules. The dashboard joins them at query time.

The 5-field cost ledger schema

The minimal viable schema is five fields per LLM call:

Field	Type	Source	Why
`agent_id`	string	agent runtime	The aggregation key; the whole point
`request_id`	string	Anthropic response header `request-id`	Correlation with audit log + Anthropic dashboard
`prompt_tokens`	integer	response header `anthropic-input-tokens`	Cost driver; multiply by input price
`completion_tokens`	integer	response header `anthropic-output-tokens`	Cost driver; multiply by output price
`model`	string	response header `anthropic-model`	Different models have different prices

The agent runtime writes one row per LLM call. The schema is small enough to live in Postgres without partitioning until well past 100 million rows. With these five fields, every allocation becomes a SQL query, not a custom report.

Cost is computed at query time, not at insert time. Anthropic prices change; the ledger stores tokens, the query joins to the price table. This keeps the row small and lets you backfill cost recalculation on price changes without rewriting history.

Two axes: LLM cost vs MCP side-effect cost

An agent invocation is not just an LLM call. It is one LLM completion plus N MCP tool calls. The LLM cost is in tokens. The MCP side-effect cost is wherever the tool touched: an S3 GET against the cloud bill, an RDS query against the database bill, a Bedrock embedding call against another LLM provider’s bill.

The ledger needs both axes.

Axis	Source	Aggregation question
LLM token cost	Anthropic response headers	”How much did agent X spend on Claude this month?”
MCP tool-call side-effect	MCP server audit log + AWS CUR (joined by request_id)	“How much did agent X spend on cloud services it called via MCP?”

The two axes do not share a row. The LLM call writes to the cost ledger; the MCP tool call writes to the existing audit log; the join key is the request_id that the agent runtime stamps on every outgoing call. A single agent invocation produces one ledger row plus 1-8 audit log rows.

The total per-invocation cost is the sum of the LLM row plus the matched audit-log rows joined to CUR. This is the query that turns “agent X cost $1.20 in tokens” into “agent X cost $1.85 all-in” once you account for the cloud resources its MCP tools touched.

Aggregations the ledger unlocks

With the ledger plus the audit-log join, every allocation is one query. A few common ones, in plain SQL:

Per-agent: SELECT agent_id, SUM(prompt_tokens * input_price + completion_tokens * output_price) FROM ledger JOIN price ON ledger.model = price.model GROUP BY agent_id. Three joins, sub-second on a 50-million-row ledger with an index on agent_id.

Per-feature: same query, but agents are pre-classified into features in a small lookup table (agent_id → feature). Finance reviews by feature now instead of by agent name.

Per-customer is harder. Customers are session-scoped, so the agent runtime has to stamp customer_id alongside agent_id on every call. This doubles the cardinality of the ledger and roughly triples storage cost, but it is the only path to per-customer LLM allocation that does not lose data on cross-customer agents (recommender systems, support bots).

How the ledger composes with closed-loop FinOps

The whole pattern is another instance of closed-loop FinOps detect-decide-act-verify, at the agent timescale instead of the cloud-resource timescale.

Detect: agent X exceeded its monthly budget at day 14. The ledger query runs on a 5-minute schedule; the threshold is crossed; the alert fires. Decide: the policy-aware governance MCP looks up the agent’s owning team, its budget tier, and any active exceptions. The policy says throttle, not page. Act: the agent runtime degrades agent X to the smaller model (Haiku instead of Sonnet) and notifies the owner team in Slack. Verify: the next 5-minute ledger sample shows the spend curve flattening; the loop closes.

Without the per-agent ledger, the detect step has nothing to fire on. The team-level Anthropic invoice is too coarse, too late, and too aggregated to trigger a 5-minute response. The 5-field schema is what makes the loop possible at the agent layer the same way that CloudWatch billing metrics make it possible at the cloud-resource layer.

The bigger point is the ledger is not an after-the-fact reporting tool. It is the substrate the closed-loop runs on. Teams that instrument it before crossing 30 agents skip the panic finance review and ship per-agent throttling on day one. Teams that do not, do the six-week archaeology.