# Sample Intelligence Per Watt Audit Report

This sample shows the artifact a paid buyer receives. Numbers are illustrative and do not represent a real customer.

## Executive Summary

ExampleCo spends an estimated $18,420 per month on production AI workloads across support automation, document extraction, and nightly enrichment jobs.

The audit found $6,240 per month in likely savings without changing the core product experience. The fastest payback items are model routing for simple support tasks, prompt compression on repeated RAG context, and batch routing for non-interactive enrichment jobs.

Estimated audit payback: 0.4 months.

## Spend Baseline

| Workflow | Monthly calls | Current monthly spend | Primary issue |
| --- | ---: | ---: | --- |
| Support agent tier 1 | 410,000 | $7,880 | Flagship model overuse |
| Contract/document extraction | 82,000 | $4,260 | Long repeated schema prompts |
| Nightly enrichment | 1,900,000 | $3,140 | Synchronous API path for batch work |
| Internal knowledge assistant | 65,000 | $2,210 | Repeated RAG context |
| Evals and regression tests | 18,000 | $930 | No cached fixture path |

Total monthly spend reviewed: $18,420.

## Cost Per Successful Task

| Workflow | Current unit cost | Target unit cost | Expected monthly savings |
| --- | ---: | ---: | ---: |
| Support resolved ticket | $0.041 | $0.027 | $2,870 |
| Extracted document | $0.052 | $0.037 | $1,230 |
| Enriched account row | $0.0017 | $0.0008 | $1,050 |
| Internal answer | $0.034 | $0.025 | $590 |
| Eval run | $0.052 | $0.024 | $500 |

Expected monthly savings: $6,240.

## Waste Map

### 1. Model Routing

Observation: 48% of support requests are deterministic or low-complexity: order status, policy lookup, basic classification, and routing.

Recommendation: Route low-risk requests to a cheaper model behind existing confidence checks. Keep high-risk billing, refund, and escalation paths on the current model until evals pass.

Estimated savings: $2,870 per month.

Confidence: High.

Risk: Medium. Requires quality gate by ticket class.

### 2. Prompt Compression

Observation: Document extraction repeats a long schema prompt and static policy context on every request.

Recommendation: Move static schema instructions into a cacheable prefix, remove duplicated examples, and split extraction from validation.

Estimated savings: $1,230 per month.

Confidence: Medium.

Risk: Low. Validate against existing accepted outputs.

### 3. Batch Routing

Observation: Nightly enrichment jobs run through the same synchronous gateway path as customer-facing requests.

Recommendation: Move non-interactive enrichment into batch API or provider-native asynchronous processing.

Estimated savings: $1,050 per month.

Confidence: Medium.

Risk: Low. No customer-facing latency dependency.

### 4. Semantic Cache

Observation: Internal knowledge assistant repeats near-identical policy and troubleshooting questions.

Recommendation: Cache grounded responses by normalized intent, source document revision, and access scope.

Estimated savings: $590 per month.

Confidence: Medium.

Risk: Medium. Must invalidate on source changes and permission boundaries.

### 5. Eval Fixture Cache

Observation: Regression tests rerun static examples through paid models on every branch.

Recommendation: Cache fixed fixtures, sample only changed prompt paths, and run full evals nightly.

Estimated savings: $500 per month.

Confidence: High.

Risk: Low.

## Ranked Patch Plan

| Rank | Patch | Monthly savings | Effort | Quality risk | First owner |
| ---: | --- | ---: | --- | --- | --- |
| 1 | Support model router for simple classes | $2,870 | 3-5 days | Medium | Backend |
| 2 | Cacheable document extraction prefix | $1,230 | 1-2 days | Low | AI platform |
| 3 | Batch route nightly enrichment | $1,050 | 2-4 days | Low | Data engineering |
| 4 | Semantic cache for knowledge assistant | $590 | 3-5 days | Medium | Platform |
| 5 | Eval fixture cache | $500 | 1 day | Low | QA/devex |

## Implementation Sprint Quote

Recommended follow-on sprint: $12,500 fixed fee.

Scope:

- Implement model router behind feature flag
- Add compressed/cacheable extraction prompt path
- Move enrichment job to batch routing
- Add savings dashboard and before/after reconciliation
- Add quality guardrails for support classes

Projected payback on implementation sprint: 2.0 months after release.

## Data Needed For A Real Audit

- Provider invoices for the last 30-90 days
- Gateway logs, traces, or request exports
- Workflow labels or endpoint names
- Success, escalation, or resolution signals
- Current eval set or accepted outputs
- Quality, latency, and compliance constraints
