Every Claude response carries cache-hit data. Most apps log it nowhere — and pay for it.
---
title: Prompt caching is the cheapest Claude optimization. Nobody measures it.
published: true
canonical_url: https://ferhatatagun.com/blog/prompt-caching-nobody-measures
description: Every Claude response carries cache-hit data. Most apps log it nowhere — and pay for it.
tags: claude, anthropic, llm, observability
cover_image:
---
Pull up the last week of Anthropic API bills from any team shipping a Claude-powered product. Two out of three of them are paying for context they could be reading from cache for one-tenth the price. Most of them don't know it, because the dashboard doesn't tell them and the SDKs don't either — by the time the response lands, the only number anyone looks at is `output_tokens`, and even then mostly when something seems expensive.
The information is in every response. Anthropic puts it in `usage`:
"usage": {
"input_tokens": 312,
"cache_creation_input_tokens": 4180,
"cache_read_input_tokens": 0,
"output_tokens": 187
}Four numbers. The first time a cached prompt runs you pay 1.25× the input price to *write* the cache. Every subsequent call within the TTL pays 0.1× to *read* it. The ratio between those two lines is the difference between a $3,000/month bill and a $300/month one. And almost no one is graphing it.
**TL;DR**
When you send a request with `cache_control: { type: "ephemeral" }` somewhere in your messages, the API checks if it's seen an identical prefix in the last 5 minutes. There are three outcomes:
1. **Cache miss, new content.** The full prompt is processed normally. `input_tokens` reflects the uncached portion; `cache_creation_input_tokens` reflects what got written into cache for next time.
2. **Cache hit.** The cached prefix is read at 10% the price. `cache_read_input_tokens` shows what was read; `input_tokens` is just the new suffix.
3. **TTL expired.** Same shape as a miss — you pay the creation surcharge again.
So a single response tells you exactly which of these three happened. Not "approximately." Exactly. Per request. For free.
The pricing math (Sonnet 4.5, June 2026) shapes up like this for a 5,000-token system prompt that gets queried once and then again four minutes later:
| Scenario | First call | Second call | Total |
|-----------------------|------------------------|----------------------
// artículos relacionados
Twitter/X: 🚀 While you’re sleeping, the market is printing. I’m a futures trader, sharing my personal high…
Twitter/X: anthropic is literally crushing the new model benchmark https://t.co/Mhyy2SZtUQ
Twitter/X: EU regulator evaluating implications of Anthropic Mythos curbs after US directive https://t.co/FE…