Claudish: We Built It Because We Wanted Choice

An Open Source Tool That Unlocks Claude Code for 100+ AI Models

TypeScript
Bun
Hono
Anthropic API
OpenRouter
Google Gemini
OpenAI
Ollama

The Locked Door

I love Claude Code. Best AI coding assistant I've used. Period.

But it only works with Anthropic's models. You want GPT-5? Gemini 3? DeepSeek? Local models? Too bad. You're locked in.

That's not just annoying. It's expensive. And it limits experimentation.

We built Claudish because we wanted out. Not out of Claude Code—out of the lock-in.

What started as a internal tool became something we use daily. Something we open-sourced. Something that's saving teams 80-90% on AI costs.

This is the story of what we built and why it matters.

Claudish --free command showing selection of free AI models: GPT-5 Nano, Grok Code, GLM-4.7, MiniMax M2.1, Gemini Flash

Run `claudish --free` and pick from free models. Zero cost. Full Claude Code experience.

Claudish running with free-tier models. Zero cost. Full Claude Code experience.

The Numbers

Metric	Result
Models supported	100+ via OpenRouter + direct integrations
Model search	300+ models with fuzzy matching
Provider integrations	7+ (OpenRouter, Gemini, OpenAI, Ollama, MiniMax, Kimi, GLM)
Context windows	128K to 2M+ tokens supported
Cost savings	80-90% vs Claude 4 (Kimi K2 via OpenRouter)
Protocol compliance	13/13 snapshot tests passing
Distribution	npm, Homebrew, shell installer
License	MIT (fully open source)

TL;DR

Claudish is a CLI tool we built that translates Claude Code's Anthropic API calls into any other AI model's format. Run it once, it starts a local proxy server, and suddenly Claude Code works with 100+ models from OpenRouter, Google, OpenAI, Ollama—anything with an API. Open source. MIT licensed. Active development. We use it daily. You can too.

The Problem: Vendor Lock-In Sucks

Claude Code is brilliant. Anthropic built something special—the agents, the tool calling, the file management. It's how AI coding should work.

But it's handcuffed to their models. Only Claude.

Here's why that's a problem:

Cost: Claude 4 is expensive. DeepSeek, MiniMax, Kimi K2? 10-40% of the price.
Experimentation: Want to try GPT-5 for reasoning tasks? Gemini 3 for multimodal? You can't.
Vendor risk: What if Anthropic changes pricing? Goes down? Deprecates the API you depend on?
Local models: Sometimes you need offline. Or privacy. Or just want to run Ollama locally.

We wanted choice. Not instead of Claude Code—with Claude Code.

The Architecture: A Translation Layer

The idea: Claude Code talks to Anthropic's API. What if we intercept that traffic, translate it to another provider's format, then translate the response back?

The reality: Easier said than done.

Here's what Claudish actually does:

Starts a local proxy server on your machine (Hono on Bun)
Intercepts Claude Code's API requests (formatted for Anthropic)
Translates the request to your target provider's format
Forwards to the provider (OpenRouter, Google, OpenAI, Ollama)
Translates the response back to Anthropic's format
Returns to Claude Code like nothing happened

Claude Code never knows it's not talking to Anthropic. The provider never knows they're being called by Claude Code. Everyone's happy.

6 dev:architect agents running in parallel via Claudish — Claude, MiniMax, GLM, Gemini, GPT-5.2, Grok reviewing the same codebase simultaneously

6 models reviewing the same codebase in parallel. Each via Claudish.

Claudish architecture flowchart: 6-step priority router, 5 handler types, adapters layer, request queue with rate limiting, and response transformation pipeline

The full architecture: 6-step router → 5 handler types → adapters → queue → transform.

Technical Challenges: The Messy Middle

A simple proxy sounds straightforward. It wasn't. Here's what made this hard:

Challenge 1: Tool Calling Translation

Anthropic uses tool_use blocks. OpenAI uses tool_calls. They look different. They behave differently. They're not compatible.

We had to build bidirectional translation. When Claude Code sends tool_use, we convert it to tool_calls for OpenAI. When the provider responds, we convert it back.

Sounds simple. Try debugging streaming responses where chunks arrive out of order. Where tool IDs need to be preserved across the entire conversation. Where nested tools can appear anywhere in the response.

We got it working. 13/13 snapshot tests passing. Full Messages API compliance.

Challenge 2: SSE Streaming Across Providers

Server-Sent Events (SSE) are how streaming responses work. Every provider does them differently.

Anthropic's format: data: {"type": "content_block_delta", ...}

OpenAI's format: data: {"choices": [{"delta": {"content": "..."}}]}

Gemini's format: Totally different.

We had to parse each provider's streaming format, extract the actual content, reassemble it in the right order, and re-emit it in Anthropic's format. All in real time. Without breaking the stream.

One bug I remember: chunks arriving out of order because of network latency. The fix: buffer and reorder based on sequence indices. Added 20 lines of code, took two days to debug.

Challenge 3: Thinking Blocks Crash Claude Code

Some models (like Gemini 3) output "thinking" blocks—internal reasoning before the final response. They're wrapped in special tags.

Send those raw to Claude Code? It crashes. Hard.

The fix: wrap thinking blocks in XML tags that Claude Code recognizes as annotations. Now they display as "thinking" instead of causing a parser error.

Three lines of code. Two days of debugging to find the right format.

Challenge 4: Token Scaling for Variable Context Windows

Claude Code assumes a fixed context window. But models have different limits—128K, 200K, 1M, even 2M tokens.

If we don't tell Claude Code about the actual context window, it might send prompts that are too large. The provider truncates. Bad things happen.

We built a token scaling system that maps Claude's context assumptions to the actual model's limits. If you're using a 2M token model, we tell Claude Code it has more room to work with.

This matters for big refactors. Large codebases. Complex multi-file operations.

Challenge 5: Dynamic Reasoning Mode Detection

Some models (Grok, o1, o3, Gemini 3) have "reasoning modes"—extended thinking before answering. They're triggered differently. They behave differently.

We had to detect when a model supports reasoning, when it's enabled, and how to format requests to trigger it. Then translate the extended thinking output back to Claude Code without breaking anything.

Grok's reasoning mode alone took a week to get right.

Cost Optimization: Real Numbers

Here's why this matters for your budget:

Model	Input (per 1M tokens)	Output (per 1M tokens)	vs Claude 4
Claude 4 Sonnet	$3.00	$15.00	baseline
DeepSeek V3	$0.40	$0.60	87% cheaper
MiniMax M2.1	$0.40	$0.80	87% cheaper
Kimi K2 (via OpenRouter)	$0.30	$1.50	90% cheaper
Grok 4.1 Fast	$1.00	$2.00	67% cheaper

These aren't minor differences. A team spending $10K/month on Claude 4 could spend $1K/month with Kimi K2. For many use cases, the cheaper models are just as good.

The strategy we recommend:

Claude 4 Sonnet: Complex reasoning, sensitive code, production decisions
DeepSeek V3: General coding, refactors, documentation (87% cheaper)
Kimi K2: Bulk operations, large codebases, experimentation (90% cheaper)
Local models (Ollama): Offline work, privacy-sensitive code, zero API costs

Claudish lets you switch between them instantly. Same workflow. Different model. Better economics.

Claudish running Gemini 3 Flash for free — status bar showing FREE label and 98% context remaining

Gemini Flash through Claude Code. Free. The status bar shows model, cost, and context usage.

Features We Built (Because We Needed Them)

Claudish evolved from a hack into a proper tool. Here's what we added along the way:

Zero-Config Interactive Setup

First run? Claudish asks for your API key and preferred model. Saves everything to a profile. Next time? Just run claudish and you're done.

No config files. No environment variables. No digging through docs.

Fuzzy Search Across 300+ Models

Want to use "that DeepSeek model" but forget the exact name? Type deep and get a ranked list. We're using fuse.js for fuzzy matching, so typos don't matter.

OpenRouter alone has 300+ models. We make it actually usable.

Real-Time Cost Tracking

OpenRouter provides per-model pricing. Claudish displays estimated costs after every run. You know exactly what you spent.

Turns out, seeing "$0.03" instead of "$0.30" makes you feel good about your choice.

Claudish running with Google Gemini 3 Pro via OpenRouter, showing live cost tracking at $0.040

Running Gemini 3 Pro through Claude Code. $0.040 per session. The status bar tells you everything.

Profile Management

Multiple API keys? Different models for different projects? Save them as profiles. claudish --profile work vs claudish --profile personal.

Simple. But essential when you're juggling clients.

Monitor Mode

Want to see what Claude Code is actually sending? Enable monitor mode. Claudish logs every request and response in real time.

Great for debugging. Great for understanding the protocol. Great for learning how these tools work under the hood.

JSON Output Mode

Building tools on top of Claudish? Use --json for machine-readable output. Status, model info, costs—all structured.

We use this for internal automation. You might too.

Parallel Run Capability

Need to run two Claude Code sessions with different models? Each Claudish instance is isolated. Different ports. Different models. No conflicts.

Run a reasoning task with o1 on port 8000 while a quick refactor with DeepSeek runs on port 8001.

Multi-model code review consensus table: Claude, Grok, Gemini, and MiniMax independently reviewing the same code and reaching consensus

Four models. One code review. Consensus-based findings catch what any single model misses.

6 parallel agents running SEO research across multiple models — 3 SERP analysts and 3 keyword researchers all powered by Claudish

6 agents. 6 models. One task. Parallel multi-model research in action.

The Community: Open Source, MIT Licensed

We could have kept this internal. But why? Vendor lock-in sucks for everyone.

Claudish is fully open source under the MIT license. Do whatever you want with it. Fork it. Modify it. Distribute it. Sell it (if you can figure out a business model—we haven't).

Distribution options:

npm: npm install -g claudish
Homebrew: brew install madappgang/tap/claudish
Shell script: One-line installer from GitHub

We're actively developing it. Bug fixes. New provider integrations. Features based on what we need and what the community requests.

Check it out: github.com/madappgang/claudish

When This Matters to You

You need Claudish if:

You're watching your AI spend: 80-90% cost savings isn't theoretical. It's real.
You want to experiment: Try new models as they launch. No waiting for official support.
You're risk-averse: Diversify across providers. No single point of failure.
You need local models: Ollama integration. No API calls. Full privacy.
You're opinionated about models: Maybe you prefer GPT-5 for reasoning. Gemini 3 for multimodal. Now you can use them.

Talk to Us

We built Claudish because we needed it. We're using it daily. We're improving it constantly.

If you hit issues. If you have ideas. If you want to contribute. The door's open.

What AI model are you dying to use with Claude Code?

Product: Claudish (MadAppGang internal tool)
Duration: 6 months (ongoing development)
Stack: TypeScript, Bun, Hono, Anthropic API, OpenRouter, Google Gemini, OpenAI, Ollama
License: MIT (fully open source)
Outcome: 80-90% cost savings, 100+ models supported, active community