How I debug Claude agents by replaying their trace

Agent traces contain everything you need to debug a weird run, but they're stored as walls of nested JSON. Stop reading them as documents, watch them as timelines.

---

title: How I debug Claude agents by replaying their trace

published: true

canonical_url: https://ferhatatagun.com/blog/debug-claude-agents-by-replaying-traces

description: Agent traces contain everything you need to debug a weird run, but they're stored as walls of nested JSON. Stop reading them as documents, watch them as timelines.

tags: claude, anthropic, agents, debugging

cover_image:

---

Your agent did something weird in production. A user reported it, you found the failed run in your logs, and you're now staring at a JSON file that's 400 messages long, half of them are `tool_result` blocks the size of small databases, and somewhere in there is the moment the agent decided to do the wrong thing.

You can't re-run the agent: the API state has moved on, the tool would behave differently now, the prompt has been updated three times since. You have only the trace.

The way most of us read agent traces is: open the JSON in an editor, ctrl+F for the tool name we suspect, scroll through walls of escaped strings, try to mentally reconstruct the sequence. It takes thirty minutes, by the end of which you have one of three answers — "yeah I see what went wrong," "I'm pretty sure I see what went wrong," or "I have no idea what went wrong." About a third of the time it's the third one, and you go ship a band-aid that may or may not fix the actual problem.

The thing nobody talks about is that this isn't a hard problem. The JSON contains all the information. The issue is purely *presentational* — it's nearly impossible to read.

**TL;DR**

Agent traces are a sequence of decisions but stored as a wall of nested JSON. The signal is there; the format is the problem.
The right primitive isn't a JSON viewer — it's a timeline. Each thought, tool call, tool result, and final answer becomes its own discrete, color-coded step.
Once you can scrub through the trace step by step, the failure point becomes visually obvious in seconds instead of minutes.
This is post-hoc, not interactive. You don't need to re-run the agent or hit the API — replay works on the raw trace alone.
A browser-only tool can do this in 4 seconds. No backend, no key, just paste the JSON.

What an agent trace actually contains

When you save a Claude agent run, you usually persist the `messages` array — the full conversation including the model's responses and the tool results you fed back. A six-step agent run looks roughly like:

jsonc

[
  { "role": "user", "content": "Find me the cheapest flight from IST to LAX next Tuesday" },
  { "role": "assistant", "content": [
    { "type": "text", "text": "I'll search for flights and check prices." },
    { "type": "tool_use", "id": "tu_01", "name": "search_flights", "input": {...} }
  ]},
  { "role": "user", "content": [
    { "type": "tool_result", "tool_use_id": "tu_01", "content": "[<2KB of JSON>]" }
  ]},
  { "role": "assistant", "content": [
    { "type": "text", "text": "Looking at three of those..." },
    { "type": "tool_us

Leer artículo completo en dev.to

// artículos relacionados

Why is Claude so mean to its subagents

Reddit29 jul

Claude tried to prompt inject me

Reddit29 jul

Adding a custom MCP server to Claude and ChatGPT

Simon Willison29 jul