# LLM Cost Triage Checklist

For SaaS teams whose OpenAI, Claude, Gemini, Bedrock, gateway, coding-agent, or RAG bill is starting to threaten margin.

If your AI bill is a meaningful share of MRR, do not start by asking "which model is cheapest." Start by finding which workflow is wasting expensive tokens.

## 1. Split Spend By Outcome

Create a 48-hour export with one row per LLM call:

- `timestamp`
- `customer_id` or account tier
- `feature` or workflow name
- `model`
- input tokens
- output tokens
- cache read/write tokens, if available
- retry count
- tool-call count
- latency
- success/failure label

The key metric is not total tokens. It is cost per successful workflow.

## 2. Rank The First Five Leaks

Sort by estimated dollars, not by annoyance:

1. Customers or accounts with the highest AI cost relative to revenue.
2. Features with the highest cost per successful result.
3. Prompts with repeated static instructions or repeated RAG chunks.
4. Retries, validation failures, fallback loops, and agent/tool loops.
5. Simple classification, extraction, routing, or rewrite tasks running on premium models.

## 3. Check For Margin Killers

These are common in production AI products:

- unbounded chat history sent on every turn
- large system prompts repeated for every request
- RAG top-k set too high without quality evidence
- frontier models used for deterministic or low-risk steps
- failed JSON/tool calls retried with the same expensive context
- eval/debug traffic mixed into production spend
- background jobs running synchronously instead of batch
- coding-agent loops re-reading the same repo/tool output

## 4. Route Before Rewriting

Model routing usually pays back faster than prompt rewrites:

- Cheap model: classify, extract, rewrite, score, summarize short context.
- Mid model: normal customer-facing answers, structured support flows, low-risk RAG.
- Frontier model: ambiguous, high-value, high-risk, or failed-cheaper-route cases.

Keep a holdout sample so the cheaper route can be judged against quality, not hope.

## 5. Add Guardrails Before Scale

At minimum:

- per-customer monthly AI cost
- per-feature cost per successful workflow
- retry and failure spend
- alert when one customer or feature spikes
- hard budget for eval/debug traffic
- pricing or usage limits for customers whose AI cost exceeds gross margin

## 6. Decide Whether An Audit Is Worth It

A paid audit usually makes sense when at least one is true:

- AI spend is above $2K/month and growing.
- AI spend is above 20% of MRR.
- One feature or customer tier is margin-negative.
- Your team cannot explain a recent bill spike in one hour.
- You are about to add agents, voice, document extraction, or long-context workflows.

## What To Share For A Bill Roast

Send sanitized numbers only:

- monthly provider spend by model/provider
- top workflows by request count
- rough retry/failure/eval traffic
- whether traffic is support, voice, document extraction, RAG, coding, or agents

Do not send API keys, credentials, raw prompts, raw outputs, customer names, account IDs, source code, or unrelated personal data.

## Paid Fast Lane

Intelligence Per Watt offers a $299 24-hour AI bill roast and a fixed-scope 72-hour audit.

Start here:

https://www.intelligenceperwatt.com/roast-my-llm-bill
