← Back to Blog
AI ObservabilityDeveloper ProductivityLLM FinOps

Your AI spend is not an investment. It's a cost centre. Here's how to fix that.

Martin Eley·

Be able to answer the board question — "what are we getting for this?" — with data instead of anecdote.

Every CFO right now has a version of the same problem.

They can see the AI tool subscriptions on the company card. Claude Max. GitHub Copilot. Cursor. Maybe a few API keys their engineers set up. The line item is real and it's growing. What they can't see is what any of it is producing.

Not because the data doesn't exist. It does. But nobody built the bridge between the token counter and the project tracker.

So the AI spend sits in a bucket labelled "engineering tools" and gets defended in QBRs with vibes and anecdotes. That's not good enough anymore. AI spend is now material enough to warrant the same rigour you'd apply to cloud infrastructure or headcount. Most teams are nowhere near that standard.

The metric everyone's tracking is the wrong one

Token usage is not a business metric.

It tells you how much compute your team consumed. It doesn't tell you whether that consumption was pointed at anything that mattered. A developer spending three hours having Claude refactor a module nobody ships generates token usage. So does a developer using Claude Code to close four high-priority tickets before the sprint ends. The token counter sees both as identical.

This is the core problem. AI observability tooling — including most of the enterprise-grade stuff — optimises for operational visibility. Those metrics matter for platform reliability. They don't answer the question a CTO or engineering manager actually cares about: what did we build with this, and what did it cost to build it?

Token spend without task attribution is vanity spend. It's the AI equivalent of measuring developer productivity by lines of code.

What traceability actually looks like

Say you're running two active development projects. One is a client-facing product under active delivery. One is internal tooling. Both teams are using Claude Code.

With no attribution, you see: £800/month on AI API spend.

With proper traceability, you see:

Project A (client product): £580 spent, 23 PRs merged, 67 tasks completed. Cost per merged PR: ~£25.

Project B (internal tooling): £220 spent, 4 PRs merged, 12 tasks completed. Cost per merged PR: ~£55.

Now you have something to work with. Is Project B's higher cost-per-PR a problem? Maybe. Maybe the tasks were harder. Maybe the tooling choices are wrong. Maybe the team needs a different workflow. You don't know yet — but you know to ask the question, and you have the data to start answering it.

That's the shift. From "we spent £800 on AI" to knowing exactly where it went and what it produced.

The data model that makes this possible

The reason most teams don't have this visibility isn't technical. It's that nobody has connected the right data sources.

Three things need to be joined together:

Token spend, attributed to a project. Not raw API cost, spend tagged to a named project at the time of the request. This is straightforward with a gateway like LiteLLM: create a virtual API key per project, and every request logged against that key carries the attribution. The spend data lands in Postgres automatically.

Output data from your project tracker. Tasks completed, PRs merged, commits pushed. This is an n8n sync job, pull from the Asana or GitHub API on a schedule, write to the same Postgres database. Twenty minutes of workflow setup.

A join that connects them. A simple lookup table: project name, LiteLLM key alias, Asana project ID, GitHub repo. With that in place, a Metabase query can tell you — for Project A this month — total spend, tasks closed, PRs merged, cost per output.

The stack is entirely open source and runs locally. Postgres, LiteLLM, n8n, Metabase, Grafana. One Docker Compose file. Operational data lives in Grafana. Business intelligence lives in Metabase.

Neither is complicated to set up. What's complicated is deciding to set it up at all.

The discipline problem

The tech is the easy part.

The hard part is tagging discipline. The attribution model only works if developers are using the right virtual key for the right project. If someone fires up Claude Code using the wrong key, that spend lands in the wrong bucket. The data gets dirty. The dashboard lies.

That's not a technology problem, it's a process problem. And it's solvable with a simple convention: one key per project, stored in a .env file at the project root, set before you start a session. You enforce it the same way you enforce any other engineering hygiene, through team agreement and occasional audit, not technical controls.

The same applies to your task tracker. The traceability model assumes Asana reflects reality. If your board is three sprints behind the actual work, your cost-per-task numbers are meaningless. Fixing your project tracker is a prerequisite for meaningful AI spend attribution, and frankly, it's a prerequisite for meaningful project management regardless.

Why this matters now

AI spend was a curiosity two years ago. It's a budget line item today. In twelve months it will be a material cost for most engineering teams and a capital allocation question for the businesses funding them.

The teams that build traceability now will know which workflows are productive and catch the ones that aren't. They'll be able to answer the board question — "what are we getting for this?" — with data instead of anecdote.

The teams that don't will keep defending a growing line item with vibes. That works, until it doesn't.

The data to fix this already exists in your stack. The question is whether anyone's going to join it up.