I built a multi-agent loop where an adversarial Claude reviewer reads your actual codebase before approving plans

Large language models are surprisingly optimistic reviewers. Ask an LLM to review an implementation...

Large language models are surprisingly optimistic reviewers.

Ask an LLM to review an implementation plan and it will often approve things that are objectively wrong:

Non-existent file paths
Incorrect function signatures
Missing edge cases
Broken assumptions about the codebase
Incomplete testing strategies

The problem is simple: the model is reasoning from its training data and the conversation context, not from your actual repository.

I wanted something different.

I wanted a reviewer whose default assumption is that the plan is wrong, and whose job is to prove it.

So I built **agent-plan-review-loop**, an open-source multi-agent orchestration system that repeatedly challenges implementation plans until they survive adversarial review.

The Core Idea

Most AI review workflows look like this:

1. Author generates a plan

2. Reviewer checks the plan

3. Reviewer approves the plan

The problem is that both agents often share the same context and reasoning chain.

My approach intentionally breaks that connection.

Every artifact is stored as a markdown file inside the repository:

Plan
Review
Questions
Decisions
Diffs

Each agent runs as a completely fresh process using Claude Code CLI.

The reviewer has no access to the author's reasoning.

It only sees:

The implementation plan
The actual repository
Its instructions

This forces the reviewer to evaluate the plan on its own merits rather than continuing the author's thought process.

In practice, this catches a surprising number of mistakes.

The Workflow

The system runs an Author → Reviewer loop until approval.

text

Task
 ↓
Classifier
 ↓
Author
 ↓
Reviewer
 ↓
CHANGES_REQUESTED?
 ↓
Yes → Author revises
 ↓
No
 ↓
APPROVED
 ↓
Coder implements

The reviewer is intentionally adversarial.

Its primary instruction is:

text

You are a SKEPTICAL senior REVIEWER.

Find why this plan will FAIL.
Do not praise it.

Default to CHANGES_REQUESTED;
approve only if genuinely sound.

Instead of asking "what's good about this plan?", the reviewer asks:

Which assumptions are wrong?
Which files don't actually exist?
Which edge cases were missed?
Which APIs are being used incorrectly?
Which tests are missing?

The result is far more useful feedback than generic AI approval.

Complexity-Aware Model Routing

One challenge with agent systems is cost.

Running the most expensive model for every task quickly becomes impractical.

To solve that, I added a lightweight classification step using Haiku.

Each task is categorized before planning begins:

| ---- | ----------------- | ------ | -------- | -------------- |

This allows the system to reserve expensive reasoning for genuinely difficult work.

Most r

Read full article on dev.to

// related articles

Scrolling through r/ClaudeAI

RedditJun 28

Day 32 of building GTA 6 using claude

RedditJun 28

Why are all the Claude Code skill files I see online completely pointless?

RedditJun 27