get in touch

claudemem: Because grep Wasn't Cutting It

Local Semantic Code Search for Claude Code and AI Agents

  • TypeScript

  • Bun

  • Tree-sitter

  • LanceDB

  • OpenRouter

  • PageRank

  • Hybrid Search

  • MCP

The Problem with grep

I've been writing code for over twenty years. Last month I spent forty-five minutes searching for a function I wrote myself three weeks earlier.

Forty-five minutes. For code I wrote.

The function was called silentTokenRefresh. Of course it was. I'd typed "token" into grep. Got 847 results. None of them were the actual token handler. I'd scrolled right past it twice before giving up and asking a colleague who remembered the name.

I started timing these searches. Three hours hunting for "the thing that validates webhook signatures." Two hours finding where we actually persist user preferences. An entire afternoon tracing why a settings change wasn't propagating—turned out there were four different settings services, and I was looking at the wrong three.

grep doesn't care about my memory. It doesn't know that when I search for "auth" I probably mean the token refresh flow, not the 200 files that happen to contain the word "authentication" in a comment.

I was tired of feeling stupid in codebases I wrote myself.

Get Started in 60 Seconds - Install, Index, Search
Three commands. That's all it takes to get semantic code search.

The Numbers

Metric Result
Search accuracy (NDCG) 175% vs baseline (voyage-code-3)
Embedding models supported 15+ (cloud + local)
LLM summarizers benchmarked 18 models across 6 evaluation methods
Languages supported 8 (TypeScript, Python, Go, Rust, C/C++, Java)
Privacy 100% local - nothing leaves your machine
Index cost ~$0.01 per 1M tokens (cloud) / $0 (local)
Distribution npm, Homebrew, shell installer
License MIT (fully open source)
TL;DR

Semantic code search that finds what you mean, not what you type. 20-minute searches now take 3 seconds. 50+ repos. 100% local. My grep usage dropped to near zero.

The Architecture: Not Just Another Search Tool

claudemem builds a semantic graph of your codebase. Not a text index. Not a fuzzy matcher. An actual understanding of what calls what and why.

The indexer parses every file into an AST. Functions, classes, methods, calls—all of it becomes nodes in a graph. Then it runs PageRank. The same algorithm Google used to find important web pages, but pointed at your code.

This matters more than it sounds. When you search, results come back ranked by architectural importance. The core authentication handler ranks higher than the seventeen wrapper functions that call it. You find the heart of your system in seconds, not hours.

Semantic search sits on top. When I search "refresh tokens silently," it doesn't just pattern-match those words. It understands token operations, refresh patterns, silent execution. It finds silentTokenRefresh even though my query didn't match the exact name.

The caller/callee analysis closed the loop. Before touching any function, I see exactly what depends on it. Last month this caught a disaster—I was about to "clean up" a function that looked unused. claudemem showed nine callers. Nine were dead code we'd been maintaining for two years. The tenth was the payment processor.

We'd been maintaining code nobody called for two years. And we almost deleted code that would have broken payments.

This isn't a research project. It's the architecture I needed to stop wasting hours every week. Everything stays local in .claudemem/ in your project. Nothing goes to a server. Ever.

It changed how we work.

Embedding Model Benchmarks: We Tested Everything

Which embedding model is best for code search? We didn't guess. We measured.

Run claudemem benchmark to test models on your actual codebase. Here's what we found on real code search tasks:

Model Speed NDCG Cost Notes
voyage-code-3 4.5s 175% $0.007 Best quality
gemini-embedding-001 2.9s 170% $0.007 Great free option
voyage-3-large 1.8s 164% $0.007 Fast & accurate
voyage-3.5-lite 1.2s 163% $0.001 Best value (default)
voyage-3.5 1.2s 150% $0.002 Fastest
mistral-embed 16.6s 150% $0.006 Slow
text-embedding-3-small 3.0s 141% $0.001 Decent
text-embedding-3-large 3.1s 141% $0.005 Not worth it
all-minilm-l6-v2 2.7s 128% $0.0001 Cheapest (local)
Summary

Best Quality: voyage-code-3 (175% NDCG)
Best Value: voyage-3.5-lite (163% NDCG, $0.001) - this is the default
Fastest: voyage-3.5 (1.2s)
Free/Local: all-minilm-l6-v2 via Ollama

claudemem LLM benchmark running with multiple generators and judges
Run `claudemem benchmark` to test models on your actual codebase. Multiple generators, multiple judges, real metrics.

LLM Summarizer Benchmarks: Which Model Describes Code Best?

claudemem generates natural language descriptions of code chunks. These descriptions power semantic search. Better descriptions = better search results.

We benchmarked 18 LLM models across 6 different evaluation methods:

Evaluation Methods

Model Retrieval Contrastive Judge Overall
gpt-5.1-codex-max 23% 83% 78% 57%
nova-premier-v1 27% 79% 51% 56%
qwen3-235b-a22b-2507 13% 92% 79% 55%
opus 16% 80% 71% 54%
deepseek-v3.2 13% 82% 74% 52%
haiku 7% 82% 69% 49%
Operational Metrics

Fastest: haiku (3.7s avg latency)
Best Quality: gpt-5.1-codex-max (57% overall)
Best Value: deepseek-v3.2 (52% quality, low cost)

Premium LLM benchmark showing opus, haiku, gpt-5.2, gemini-3-pro, deepseek-v3.2 generating code descriptions
Benchmarking premium models: opus, haiku, gpt-5.2, gemini-3-pro, deepseek-v3.2, kimi-k2, and more running in parallel.

Symbol Graph: Beyond Search

Search is table stakes. The real power is the symbol graph.

claudemem tracks every reference between symbols. It computes PageRank scores based on how central each function/class is to your codebase. This enables:

Dead Code Detection

claudemem dead-code

Finds symbols with zero callers + low PageRank + not exported. Great for cleaning up unused code.

Test Coverage Gaps

claudemem test-gaps

Finds high-PageRank symbols not called by any test file. Prioritize what to test next.

Change Impact Analysis

claudemem impact FileTracker

Shows all transitive callers, grouped by file. Understand the blast radius before refactoring.

Symbol Navigation

claudemem symbol handleAuth     # find definition
claudemem callers handleAuth   # what calls this?
claudemem callees handleAuth   # what does this call?
claudemem context handleAuth   # all of the above

Self-Learning System (Experimental)

This is where it gets interesting.

Traditional ML validation assumes millions of samples and explicit labels. Our context is different: 50-500 sessions per project, no user ratings, data stays local.

claudemem's self-learning system uses implicit feedback signals:

Signal Type How Detected Weight
Lexical Correction User says "no", "wrong", "actually" 0.30
Strategy Pivot Sudden change in tool usage after failure 0.20
Overwrite User edits same file region agent modified 0.35
Reask User repeats similar prompt 0.15

The strongest signal: Code Survival Rate

code_survival_rate = lines_kept / lines_written_by_agent

If the user keeps the agent's code in their git commit, the agent did well. If they rewrite everything, it failed.

What Gets Generated

Safety Validation

Changes are tested against a Red Team before deployment:

Using with Claude Code

Run claudemem as an MCP server:

claudemem --mcp

Then Claude Code can use these tools:

Detective Agents

Install the code-analysis plugin for pre-built agents that use claudemem:

claudemem integrated with OpenCode IDE - showing available tools: map, search, symbol, callers, callees, context
claudemem running in OpenCode with GLM-4.7. All semantic search and code analysis tools available: map, search, symbol, callers, callees, context.

Documentation Indexing

claudemem can automatically fetch and index documentation for your dependencies. Search across both your code AND the frameworks you use.

Sources (in priority order):

claudemem docs fetch              # fetch docs for all detected dependencies
claudemem docs fetch react vue    # fetch specific libraries
claudemem docs status             # show indexed docs

How We Compare

Feature claudemem Context Greptile Amp
Cost Free / MIT Free (needs API) $30/dev/mo $1,000+ min
Privacy 100% Local Cloud default Cloud Cloud only
CLI Tool Yes No No No
Symbol Graph Yes + PageRank Yes No Yes
Adaptive Learning Yes (EMA-based) Yes No No
Embedding Models Any (cloud/local) Fixed Fixed Fixed
Built-in Benchmarks Full suite No No No

When This Matters to You

You need claudemem if:

Quick Start

# Install
npm install -g claude-codemem

# Setup
claudemem init

# Index your project
claudemem index

# Search
claudemem search "authentication flow"
claudemem search "where do we validate user input"

That's it. Changed some files? Just search again - it auto-reindexes modified files before searching.

claudemem in action: semantic code search that actually understands what you're looking for.
Mass model benchmark running 25+ models in parallel - qwen, codestral, ministral, olmo, kat-coder, and more
Benchmark everything. 25+ models running in parallel. Find the best embedding model for your codebase.

Talk to Us

We built claudemem because we needed it. We're using it daily. We're improving it constantly.

If you hit issues. If you have ideas. If you want to contribute. The door's open.

What code search problem are you trying to solve?

Product: claudemem (MadAppGang internal tool)
Duration: 8 months (ongoing development)
Stack: TypeScript, Bun, Tree-sitter, LanceDB, OpenRouter, PageRank
License: MIT (fully open source)
Outcome: 175% NDCG improvement, 100% local privacy, active community