Migrating Claude Code to a custom backend in 2 lines (and what to actually watch for)

Claude Code's ANTHROPIC_BASE_URL env var lets you redirect every request anywhere you want — but the 2-line setup hides a real production checklist. What breaks, what to watch for, when it's worth it.

---

title: Migrating Claude Code to a custom backend in 2 lines (and what to actually watch for)

published: True

description: Claude Code's ANTHROPIC_BASE_URL env var lets you redirect every request anywhere you want — but the 2-line setup hides a real production checklist. What breaks, what to watch for, when it's worth it.

tags: ai, claudeai, anthropic, devops

cover_image:

---

**TL;DR:** Claude Code's `ANTHROPIC_BASE_URL` env var lets you redirect every request to any compatible backend in one shell line. Almost no one does it, but it unlocks caching, rate-limit pooling, multi-vendor routing, audit logging, and billing routing without rewriting any code. Here's the 2-line setup, the things that quietly break, and the production checklist for actually running it.

---

The 2-line setup

bash

export ANTHROPIC_BASE_URL=https://your-proxy.example.com
export ANTHROPIC_API_KEY=your-key
claude

That's it. The Anthropic SDK (which Claude Code is built on) doesn't care where it's sending requests, as long as the response comes back in Anthropic Messages format. It'll append `/v1/messages` automatically, forward all your `anthropic-version` and `anthropic-beta` headers, and stream SSE the same way.

Same trick works with Cursor, Cline, Continue, and anything else that builds on the Anthropic SDK.

---

Why would you do this?

I've found five patterns that justify a custom backend:

1. Caching middleware

Anthropic's prompt cache feature requires manual `cache_control` markers that most teams skip. A proxy can inject them server-side on every system message. For a Claude Code session with a 20-30K-token system prompt and 50+ turns, this cuts your bill by 60-80%.

2. Rate-limit pooling

Most teams have 3-5 API keys spread across projects. Without a proxy, each is rate-limited independently — you can hit 429 on one while another sits idle. A proxy can round-robin across keys, exposing one virtual key with the combined rate budget.

3. Multi-vendor abstraction

Want to route `claude-haiku-4-5` calls to one provider and `claude-opus-4-7` calls to another (maybe one offers Haiku throughput discounts)? A proxy inspects the `model` field and routes accordingly. Your Claude Code config doesn't change.

4. Audit logging

Claude Code makes a *lot* of requests. Without instrumentation, you have no idea what each session cost. A proxy can log per-request metadata (model, tokens, latency, session id) to a warehouse.

5. Billing routing

Different team members hitting different cost centers? A proxy can map API keys to billing tags and forward usage to finance.

---

What breaks (and what to watch for)

The 2-line setup works for the happy path. In practice, here's what bites you.

Streaming behavior

Anthropic returns chunked SSE for `stream: true`. If your proxy buffers responses or doesn't flush each chunk immediately, you'll see massive perceived latency — Claude Code will spin while waiting for the full response.

Fix: pass thro

Read full article on dev.to

// related articles

Why is Claude so mean to its subagents

RedditJul 29

Claude tried to prompt inject me

RedditJul 29

Adding a custom MCP server to Claude and ChatGPT

Simon WillisonJul 29