get in touch
Claudish Icon

Claudish: We Built It Because We Wanted Choice

An Open Source Tool That Unlocks Claude Code for 100+ AI Models

  • TypeScript

  • Bun

  • Hono

  • Anthropic API

  • OpenRouter

  • Google Gemini

  • OpenAI

  • Ollama

The Locked Door

I love Claude Code. Best AI coding assistant I've used. Period.

But it only works with Anthropic's models. You want GPT-5? Gemini 3? DeepSeek? Local models? Too bad. You're locked in.

That's not just annoying. It's expensive. And it limits experimentation.

We built Claudish because we wanted out. Not out of Claude Code—out of the lock-in.

What started as a internal tool became something we use daily. Something we open-sourced. Something that's saving teams 80-90% on AI costs.

This is the story of what we built and why it matters.

Claudish --free command showing selection of free AI models: GPT-5 Nano, Grok Code, GLM-4.7, MiniMax M2.1, Gemini Flash
Run `claudish --free` and pick from free models. Zero cost. Full Claude Code experience.
Claudish running with free-tier models. Zero cost. Full Claude Code experience.

The Numbers

Metric Result
Models supported 100+ via OpenRouter + direct integrations
Model search 300+ models with fuzzy matching
Provider integrations 7+ (OpenRouter, Gemini, OpenAI, Ollama, MiniMax, Kimi, GLM)
Context windows 128K to 2M+ tokens supported
Cost savings 80-90% vs Claude 4 (Kimi K2 via OpenRouter)
Protocol compliance 13/13 snapshot tests passing
Distribution npm, Homebrew, shell installer
License MIT (fully open source)
TL;DR

Claudish is a CLI tool we built that translates Claude Code's Anthropic API calls into any other AI model's format. Run it once, it starts a local proxy server, and suddenly Claude Code works with 100+ models from OpenRouter, Google, OpenAI, Ollama—anything with an API. Open source. MIT licensed. Active development. We use it daily. You can too.

The Problem: Vendor Lock-In Sucks

Claude Code is brilliant. Anthropic built something special—the agents, the tool calling, the file management. It's how AI coding should work.

But it's handcuffed to their models. Only Claude.

Here's why that's a problem:

We wanted choice. Not instead of Claude Code—with Claude Code.

The Architecture: A Translation Layer

The idea: Claude Code talks to Anthropic's API. What if we intercept that traffic, translate it to another provider's format, then translate the response back?

The reality: Easier said than done.

Here's what Claudish actually does:

Claude Code never knows it's not talking to Anthropic. The provider never knows they're being called by Claude Code. Everyone's happy.

6 dev:architect agents running in parallel via Claudish — Claude, MiniMax, GLM, Gemini, GPT-5.2, Grok reviewing the same codebase simultaneously
6 models reviewing the same codebase in parallel. Each via Claudish.
Claudish architecture flowchart: 6-step priority router, 5 handler types, adapters layer, request queue with rate limiting, and response transformation pipeline
The full architecture: 6-step router → 5 handler types → adapters → queue → transform.

Technical Challenges: The Messy Middle

A simple proxy sounds straightforward. It wasn't. Here's what made this hard:

Challenge 1: Tool Calling Translation

Anthropic uses tool_use blocks. OpenAI uses tool_calls. They look different. They behave differently. They're not compatible.

We had to build bidirectional translation. When Claude Code sends tool_use, we convert it to tool_calls for OpenAI. When the provider responds, we convert it back.

Sounds simple. Try debugging streaming responses where chunks arrive out of order. Where tool IDs need to be preserved across the entire conversation. Where nested tools can appear anywhere in the response.

We got it working. 13/13 snapshot tests passing. Full Messages API compliance.

Challenge 2: SSE Streaming Across Providers

Server-Sent Events (SSE) are how streaming responses work. Every provider does them differently.

Anthropic's format: data: {"type": "content_block_delta", ...}

OpenAI's format: data: {"choices": [{"delta": {"content": "..."}}]}

Gemini's format: Totally different.

We had to parse each provider's streaming format, extract the actual content, reassemble it in the right order, and re-emit it in Anthropic's format. All in real time. Without breaking the stream.

One bug I remember: chunks arriving out of order because of network latency. The fix: buffer and reorder based on sequence indices. Added 20 lines of code, took two days to debug.

Challenge 3: Thinking Blocks Crash Claude Code

Some models (like Gemini 3) output "thinking" blocks—internal reasoning before the final response. They're wrapped in special tags.

Send those raw to Claude Code? It crashes. Hard.

The fix: wrap thinking blocks in XML tags that Claude Code recognizes as annotations. Now they display as "thinking" instead of causing a parser error.

Three lines of code. Two days of debugging to find the right format.

Challenge 4: Token Scaling for Variable Context Windows

Claude Code assumes a fixed context window. But models have different limits—128K, 200K, 1M, even 2M tokens.

If we don't tell Claude Code about the actual context window, it might send prompts that are too large. The provider truncates. Bad things happen.

We built a token scaling system that maps Claude's context assumptions to the actual model's limits. If you're using a 2M token model, we tell Claude Code it has more room to work with.

This matters for big refactors. Large codebases. Complex multi-file operations.

Challenge 5: Dynamic Reasoning Mode Detection

Some models (Grok, o1, o3, Gemini 3) have "reasoning modes"—extended thinking before answering. They're triggered differently. They behave differently.

We had to detect when a model supports reasoning, when it's enabled, and how to format requests to trigger it. Then translate the extended thinking output back to Claude Code without breaking anything.

Grok's reasoning mode alone took a week to get right.

Cost Optimization: Real Numbers

Here's why this matters for your budget:

Model Input (per 1M tokens) Output (per 1M tokens) vs Claude 4
Claude 4 Sonnet $3.00 $15.00 baseline
DeepSeek V3 $0.40 $0.60 87% cheaper
MiniMax M2.1 $0.40 $0.80 87% cheaper
Kimi K2 (via OpenRouter) $0.30 $1.50 90% cheaper
Grok 4.1 Fast $1.00 $2.00 67% cheaper

These aren't minor differences. A team spending $10K/month on Claude 4 could spend $1K/month with Kimi K2. For many use cases, the cheaper models are just as good.

The strategy we recommend:

Claudish lets you switch between them instantly. Same workflow. Different model. Better economics.

Claudish running Gemini 3 Flash for free — status bar showing FREE label and 98% context remaining
Gemini Flash through Claude Code. Free. The status bar shows model, cost, and context usage.

Features We Built (Because We Needed Them)

Claudish evolved from a hack into a proper tool. Here's what we added along the way:

Zero-Config Interactive Setup

First run? Claudish asks for your API key and preferred model. Saves everything to a profile. Next time? Just run claudish and you're done.

No config files. No environment variables. No digging through docs.

Fuzzy Search Across 300+ Models

Want to use "that DeepSeek model" but forget the exact name? Type deep and get a ranked list. We're using fuse.js for fuzzy matching, so typos don't matter.

OpenRouter alone has 300+ models. We make it actually usable.

Real-Time Cost Tracking

OpenRouter provides per-model pricing. Claudish displays estimated costs after every run. You know exactly what you spent.

Turns out, seeing "$0.03" instead of "$0.30" makes you feel good about your choice.

Claudish running with Google Gemini 3 Pro via OpenRouter, showing live cost tracking at $0.040
Running Gemini 3 Pro through Claude Code. $0.040 per session. The status bar tells you everything.

Profile Management

Multiple API keys? Different models for different projects? Save them as profiles. claudish --profile work vs claudish --profile personal.

Simple. But essential when you're juggling clients.

Monitor Mode

Want to see what Claude Code is actually sending? Enable monitor mode. Claudish logs every request and response in real time.

Great for debugging. Great for understanding the protocol. Great for learning how these tools work under the hood.

JSON Output Mode

Building tools on top of Claudish? Use --json for machine-readable output. Status, model info, costs—all structured.

We use this for internal automation. You might too.

Parallel Run Capability

Need to run two Claude Code sessions with different models? Each Claudish instance is isolated. Different ports. Different models. No conflicts.

Run a reasoning task with o1 on port 8000 while a quick refactor with DeepSeek runs on port 8001.

Multi-model code review consensus table: Claude, Grok, Gemini, and MiniMax independently reviewing the same code and reaching consensus
Four models. One code review. Consensus-based findings catch what any single model misses.
6 parallel agents running SEO research across multiple models — 3 SERP analysts and 3 keyword researchers all powered by Claudish
6 agents. 6 models. One task. Parallel multi-model research in action.

The Community: Open Source, MIT Licensed

We could have kept this internal. But why? Vendor lock-in sucks for everyone.

Claudish is fully open source under the MIT license. Do whatever you want with it. Fork it. Modify it. Distribute it. Sell it (if you can figure out a business model—we haven't).

Distribution options:

We're actively developing it. Bug fixes. New provider integrations. Features based on what we need and what the community requests.

Check it out: github.com/madappgang/claudish

When This Matters to You

You need Claudish if:

Talk to Us

We built Claudish because we needed it. We're using it daily. We're improving it constantly.

If you hit issues. If you have ideas. If you want to contribute. The door's open.

What AI model are you dying to use with Claude Code?

Product: Claudish (MadAppGang internal tool)
Duration: 6 months (ongoing development)
Stack: TypeScript, Bun, Hono, Anthropic API, OpenRouter, Google Gemini, OpenAI, Ollama
License: MIT (fully open source)
Outcome: 80-90% cost savings, 100+ models supported, active community