Every request to Claude costs money. input_tokens includes everything: the system prompt, conversation history, and the current question. If you make 100 requests with the same 500-token system prompt, you’re paying for 50,000 tokens needlessly.
Prompt caching stores part of the prompt on Anthropic’s side. Repeated requests that hit the cached content cost 10% of the normal price.
How to Enable Caching
Change system from a plain string to a list with cache_control:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[
{
"type": "text",
"text": "Ты — Python-тьютор. Объясняешь просто...",
"cache_control": {"type": "ephemeral"}
}
],
messages=[{"role": "user", "content": "Что такое list comprehension?"}]
)
Reading Cache Metrics
usage = response.usage
print(f"Обычные входящие токены: {usage.input_tokens}")
print(f"Создан кеш (1.25x цена): {usage.cache_creation_input_tokens}")
print(f"Прочитан кеш (0.1x цена): {usage.cache_read_input_tokens}")
- First request —
cache_creation_input_tokens > 0(cache is created, costs 1.25x) - Subsequent requests —
cache_read_input_tokens > 0(cache is read, costs 0.1x)
Batch Processing — Where Caching Really Pays Off
texts = ["Текст 1", "Текст 2", "Текст 3", ...]
SYSTEM = [{"type": "text", "text": "Длинный system prompt...", "cache_control": {"type": "ephemeral"}}]
total_saved = 0
for text in texts:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=512,
system=SYSTEM,
messages=[{"role": "user", "content": text}]
)
# From the second request onward, the cache is already being read
saved = response.usage.cache_read_input_tokens
total_saved += saved
print(f"Сэкономлено токенов через кеш: {total_saved}")
When the Cache Is Created vs Read
| Request | cache_creation | cache_read | Cost |
|---|---|---|---|
| 1st | > 0 | 0 | 1.25x for the cached portion |
| 2nd+ | 0 | > 0 | 0.1x for the cached portion |
What Can Be Cached
- System prompt (the most common case)
- Long documents at the start of
messages - Few-shot examples
Minimum size for caching: 1024 tokens (for claude-sonnet-4-6).
Calculating Real Savings
def calc_cost(usage, model="sonnet"):
rates = {"sonnet": (3.0, 15.0, 3.75, 0.3)} # in, out, cache_create, cache_read per 1M
r = rates[model]
cost = (
usage.input_tokens * r[0] +
usage.output_tokens * r[1] +
(usage.cache_creation_input_tokens or 0) * r[2] +
(usage.cache_read_input_tokens or 0) * r[3]
) / 1_000_000
return cost
💬 Comments (0)
No comments yet
Be the first to share your opinion about this article!