dev.toJune 25, 2026NEW AFFECTS EXAM
Modelo

I built a multi-agent loop where an adversarial Claude reviewer reads your actual codebase before approving plans

Large language models are surprisingly optimistic reviewers. Ask an LLM to review an implementation...

Large language models are surprisingly optimistic reviewers.

Ask an LLM to review an implementation plan and it will often approve things that are objectively wrong:

  • Non-existent file paths
  • Incorrect function signatures
  • Missing edge cases
  • Broken assumptions about the codebase
  • Incomplete testing strategies

The problem is simple: the model is reasoning from its training data and the conversation context, not from your actual repository.

I wanted something different.

I wanted a reviewer whose default assumption is that the plan is wrong, and whose job is to prove it.

So I built **agent-plan-review-loop**, an open-source multi-agent orchestration system that repeatedly challenges implementation plans until they survive adversarial review.

The Core Idea

Most AI review workflows look like this:

1. Author generates a plan

2. Reviewer checks the plan

3. Reviewer approves the plan

The problem is that both agents often share the same context and reasoning chain.

My approach intentionally breaks that connection.

Every artifact is stored as a markdown file inside the repository:

  • Plan
  • Review
  • Questions
  • Decisions
  • Diffs

Each agent runs as a completely fresh process using Claude Code CLI.

The reviewer has no access to the author's reasoning.

It only sees:

  • The implementation plan
  • The actual repository
  • Its instructions

This forces the reviewer to evaluate the plan on its own merits rather than continuing the author's thought process.

In practice, this catches a surprising number of mistakes.

The Workflow

The system runs an Author → Reviewer loop until approval.

text
Task
 ↓
Classifier
 ↓
Author
 ↓
Reviewer
 ↓
CHANGES_REQUESTED?
 ↓
Yes → Author revises
 ↓
No
 ↓
APPROVED
 ↓
Coder implements

The reviewer is intentionally adversarial.

Its primary instruction is:

text
You are a SKEPTICAL senior REVIEWER.

Find why this plan will FAIL.
Do not praise it.

Default to CHANGES_REQUESTED;
approve only if genuinely sound.

Instead of asking "what's good about this plan?", the reviewer asks:

  • Which assumptions are wrong?
  • Which files don't actually exist?
  • Which edge cases were missed?
  • Which APIs are being used incorrectly?
  • Which tests are missing?

The result is far more useful feedback than generic AI approval.

Complexity-Aware Model Routing

One challenge with agent systems is cost.

Running the most expensive model for every task quickly becomes impractical.

To solve that, I added a lightweight classification step using Haiku.

Each task is categorized before planning begins:

| Tier | Task Type | Author | Reviewer | Max Iterations |

| ---- | ----------------- | ------ | -------- | -------------- |

| T0 | Text, Config, CSS | Sonnet | Opus | 3 |

| T1 | Small Feature | Sonnet | Opus | 3 |

| T2 | Complex Refactor | Opus | Sonnet | 6 |

This allows the system to reserve expensive reasoning for genuinely difficult work.

Most r

Read full article on dev.to