Claudish: We Built It Because We Wanted Choice
An Open Source Tool That Unlocks Claude Code for 100+ AI Models
TypeScript
Bun
Hono
Anthropic API
OpenRouter
Google Gemini
OpenAI
Ollama
The Locked Door
I love Claude Code. Best AI coding assistant I've used. Period.
But it only works with Anthropic's models. You want GPT-5? Gemini 3? DeepSeek? Local models? Too bad. You're locked in.
That's not just annoying. It's expensive. And it limits experimentation.
We built Claudish because we wanted out. Not out of Claude Code—out of the lock-in.
What started as a internal tool became something we use daily. Something we open-sourced. Something that's saving teams 80-90% on AI costs.
This is the story of what we built and why it matters.
The Numbers
| Metric | Result |
|---|---|
| Models supported | 100+ via OpenRouter + direct integrations |
| Model search | 300+ models with fuzzy matching |
| Provider integrations | 7+ (OpenRouter, Gemini, OpenAI, Ollama, MiniMax, Kimi, GLM) |
| Context windows | 128K to 2M+ tokens supported |
| Cost savings | 80-90% vs Claude 4 (Kimi K2 via OpenRouter) |
| Protocol compliance | 13/13 snapshot tests passing |
| Distribution | npm, Homebrew, shell installer |
| License | MIT (fully open source) |
Claudish is a CLI tool we built that translates Claude Code's Anthropic API calls into any other AI model's format. Run it once, it starts a local proxy server, and suddenly Claude Code works with 100+ models from OpenRouter, Google, OpenAI, Ollama—anything with an API. Open source. MIT licensed. Active development. We use it daily. You can too.
The Problem: Vendor Lock-In Sucks
Claude Code is brilliant. Anthropic built something special—the agents, the tool calling, the file management. It's how AI coding should work.
But it's handcuffed to their models. Only Claude.
Here's why that's a problem:
- Cost: Claude 4 is expensive. DeepSeek, MiniMax, Kimi K2? 10-40% of the price.
- Experimentation: Want to try GPT-5 for reasoning tasks? Gemini 3 for multimodal? You can't.
- Vendor risk: What if Anthropic changes pricing? Goes down? Deprecates the API you depend on?
- Local models: Sometimes you need offline. Or privacy. Or just want to run Ollama locally.
We wanted choice. Not instead of Claude Code—with Claude Code.
The Architecture: A Translation Layer
The idea: Claude Code talks to Anthropic's API. What if we intercept that traffic, translate it to another provider's format, then translate the response back?
The reality: Easier said than done.
Here's what Claudish actually does:
- Starts a local proxy server on your machine (Hono on Bun)
- Intercepts Claude Code's API requests (formatted for Anthropic)
- Translates the request to your target provider's format
- Forwards to the provider (OpenRouter, Google, OpenAI, Ollama)
- Translates the response back to Anthropic's format
- Returns to Claude Code like nothing happened
Claude Code never knows it's not talking to Anthropic. The provider never knows they're being called by Claude Code. Everyone's happy.
Technical Challenges: The Messy Middle
A simple proxy sounds straightforward. It wasn't. Here's what made this hard:
Challenge 1: Tool Calling Translation
Anthropic uses tool_use blocks. OpenAI uses tool_calls. They look different. They behave differently. They're not compatible.
We had to build bidirectional translation. When Claude Code sends tool_use, we convert it to tool_calls for OpenAI. When the provider responds, we convert it back.
Sounds simple. Try debugging streaming responses where chunks arrive out of order. Where tool IDs need to be preserved across the entire conversation. Where nested tools can appear anywhere in the response.
We got it working. 13/13 snapshot tests passing. Full Messages API compliance.
Challenge 2: SSE Streaming Across Providers
Server-Sent Events (SSE) are how streaming responses work. Every provider does them differently.
Anthropic's format: data: {"type": "content_block_delta", ...}
OpenAI's format: data: {"choices": [{"delta": {"content": "..."}}]}
Gemini's format: Totally different.
We had to parse each provider's streaming format, extract the actual content, reassemble it in the right order, and re-emit it in Anthropic's format. All in real time. Without breaking the stream.
One bug I remember: chunks arriving out of order because of network latency. The fix: buffer and reorder based on sequence indices. Added 20 lines of code, took two days to debug.
Challenge 3: Thinking Blocks Crash Claude Code
Some models (like Gemini 3) output "thinking" blocks—internal reasoning before the final response. They're wrapped in special tags.
Send those raw to Claude Code? It crashes. Hard.
The fix: wrap thinking blocks in XML tags that Claude Code recognizes as annotations. Now they display as "thinking" instead of causing a parser error.
Three lines of code. Two days of debugging to find the right format.
Challenge 4: Token Scaling for Variable Context Windows
Claude Code assumes a fixed context window. But models have different limits—128K, 200K, 1M, even 2M tokens.
If we don't tell Claude Code about the actual context window, it might send prompts that are too large. The provider truncates. Bad things happen.
We built a token scaling system that maps Claude's context assumptions to the actual model's limits. If you're using a 2M token model, we tell Claude Code it has more room to work with.
This matters for big refactors. Large codebases. Complex multi-file operations.
Challenge 5: Dynamic Reasoning Mode Detection
Some models (Grok, o1, o3, Gemini 3) have "reasoning modes"—extended thinking before answering. They're triggered differently. They behave differently.
We had to detect when a model supports reasoning, when it's enabled, and how to format requests to trigger it. Then translate the extended thinking output back to Claude Code without breaking anything.
Grok's reasoning mode alone took a week to get right.
Cost Optimization: Real Numbers
Here's why this matters for your budget:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | vs Claude 4 |
|---|---|---|---|
| Claude 4 Sonnet | $3.00 | $15.00 | baseline |
| DeepSeek V3 | $0.40 | $0.60 | 87% cheaper |
| MiniMax M2.1 | $0.40 | $0.80 | 87% cheaper |
| Kimi K2 (via OpenRouter) | $0.30 | $1.50 | 90% cheaper |
| Grok 4.1 Fast | $1.00 | $2.00 | 67% cheaper |
These aren't minor differences. A team spending $10K/month on Claude 4 could spend $1K/month with Kimi K2. For many use cases, the cheaper models are just as good.
The strategy we recommend:
- Claude 4 Sonnet: Complex reasoning, sensitive code, production decisions
- DeepSeek V3: General coding, refactors, documentation (87% cheaper)
- Kimi K2: Bulk operations, large codebases, experimentation (90% cheaper)
- Local models (Ollama): Offline work, privacy-sensitive code, zero API costs
Claudish lets you switch between them instantly. Same workflow. Different model. Better economics.
Features We Built (Because We Needed Them)
Claudish evolved from a hack into a proper tool. Here's what we added along the way:
Zero-Config Interactive Setup
First run? Claudish asks for your API key and preferred model. Saves everything to a profile. Next time? Just run claudish and you're done.
No config files. No environment variables. No digging through docs.
Fuzzy Search Across 300+ Models
Want to use "that DeepSeek model" but forget the exact name? Type deep and get a ranked list. We're using fuse.js for fuzzy matching, so typos don't matter.
OpenRouter alone has 300+ models. We make it actually usable.
Real-Time Cost Tracking
OpenRouter provides per-model pricing. Claudish displays estimated costs after every run. You know exactly what you spent.
Turns out, seeing "$0.03" instead of "$0.30" makes you feel good about your choice.
Profile Management
Multiple API keys? Different models for different projects? Save them as profiles. claudish --profile work vs claudish --profile personal.
Simple. But essential when you're juggling clients.
Monitor Mode
Want to see what Claude Code is actually sending? Enable monitor mode. Claudish logs every request and response in real time.
Great for debugging. Great for understanding the protocol. Great for learning how these tools work under the hood.
JSON Output Mode
Building tools on top of Claudish? Use --json for machine-readable output. Status, model info, costs—all structured.
We use this for internal automation. You might too.
Parallel Run Capability
Need to run two Claude Code sessions with different models? Each Claudish instance is isolated. Different ports. Different models. No conflicts.
Run a reasoning task with o1 on port 8000 while a quick refactor with DeepSeek runs on port 8001.
The Community: Open Source, MIT Licensed
We could have kept this internal. But why? Vendor lock-in sucks for everyone.
Claudish is fully open source under the MIT license. Do whatever you want with it. Fork it. Modify it. Distribute it. Sell it (if you can figure out a business model—we haven't).
Distribution options:
- npm:
npm install -g claudish - Homebrew:
brew install madappgang/tap/claudish - Shell script: One-line installer from GitHub
We're actively developing it. Bug fixes. New provider integrations. Features based on what we need and what the community requests.
Check it out: github.com/madappgang/claudish
When This Matters to You
You need Claudish if:
- You're watching your AI spend: 80-90% cost savings isn't theoretical. It's real.
- You want to experiment: Try new models as they launch. No waiting for official support.
- You're risk-averse: Diversify across providers. No single point of failure.
- You need local models: Ollama integration. No API calls. Full privacy.
- You're opinionated about models: Maybe you prefer GPT-5 for reasoning. Gemini 3 for multimodal. Now you can use them.
Talk to Us
We built Claudish because we needed it. We're using it daily. We're improving it constantly.
If you hit issues. If you have ideas. If you want to contribute. The door's open.
What AI model are you dying to use with Claude Code?
Product: Claudish (MadAppGang internal tool)
Duration: 6 months (ongoing development)
Stack: TypeScript, Bun, Hono, Anthropic API, OpenRouter, Google Gemini, OpenAI, Ollama
License: MIT (fully open source)
Outcome: 80-90% cost savings, 100+ models supported, active community