Code map your repo. Agents hit the right files on the first try.
codemap indexes every file with a one-line summary, then uses a cheap fast model to pick only the files relevant to a task. Instead of your agent burning ten tool calls exploring the codebase, it gets the exact files it needs in one shot.
An AI agent opens your repo and the first ten minutes are a tax: grep, glob, read, read, read. It's trying to build a mental model of the code every single time, out of nothing. The context window fills with dead ends. The answer, when it finally arrives, is worse — because half the budget went to the search.
The fix isn't a bigger model. It's giving the agent a map. codemap indexes your repo with a cheap fast model once, then uses another cheap fast model to pick only the files that matter for each task. The expensive model only ever sees focused context.
Higher precision on the input. Higher precision on the output.
A build pass over the repo produces an index. A select pass takes a task description, reads the index (not the source), and returns the files that matter. The expensive coding model never sees the 2688-file pile.
One line per file, with structure
Every file gets a summary, a when-to-use hint, exported types and functions, imports, and keywords. Deterministic facts come from parsers; semantic fields come from a cheap model.
- summary
- when_to_use
- public_types
- public_functions
- imports · keywords
Pick the files that matter
Given a task, a cheap fast model reads the summaries (not the source) and returns the 5–10 files the big model should see. You get focused context instead of a 2 MB wall.
- 381 candidates → 5 files
- 30–80k tokens out
- task file or MCP call
- provider-agnostic
Native tool calls, zero glue
Claude Code registers codemap in one command and picks up three tools. No wrapper, no prompt, no tutorial — the agent just learns to call them.
codemap_selectcodemap_statuscodemap_build- SessionStart hook
# install go install github.com/jonnonz1/codemap/cmd/codemap@latest # 1. Interactive setup — picks provider, model, API key cd your-project codemap init # 2. Index the repo. First run takes a few minutes; after that it's seconds. codemap build # 3. Register the MCP server with Claude Code (once, globally). claude mcp add codemap -- codemap mcp # Start a Claude Code session — codemap_select is a native tool.
codemap init and pick mock. You get placeholder summaries but the full workflow — handy for trying the loop before committing to a provider.
Build is cheap and cached. Each file goes through six passes; unchanged files skip the expensive ones entirely via mtime + BLAKE3 lookup.
.gitignore and .codemap.yaml rules.go/ast; registry for more..claude/cache/. Safe to interrupt.public_types, public_functions, and imports always come from the parser. The model writes summary, when_to_use, and keywords — never code facts.
Each file in the code map is a single JSON object. Deterministic fields on the left come from the parser; semantic fields on the right come from the cheap model. The big coding model never reads this — it only reads what select returns.
// .claude/cache/codemap.jsonl — one line per file { "path": "internal/build/orchestrator.go", "hash": "blake3:9f2d1a…", "mtime": "2026-04-18T14:22:07Z", // --- parser (deterministic) --- "public_types": ["Orchestrator", "BuildResult"], "public_functions": ["NewOrchestrator", "Run", "runOne"], "imports": [ "github.com/jonnonz1/codemap/internal/hash", "github.com/jonnonz1/codemap/internal/llm", "github.com/jonnonz1/codemap/internal/parse" ], // --- cheap model (semantic) --- "summary": "Concurrent build orchestrator: fans out per-file work, rate-limits LLM calls, writes atomically.", "when_to_use": "When changing how the build runs — concurrency, backpressure, resume, or failure handling.", "keywords": ["concurrency", "rate-limit", "atomic", "orchestrator"] }
public_functions and imports come from go/ast and are always correct. If a summary drifts after a refactor, it re-summarises on the next build automatically.
codemap is a Go CLI with an MCP server entry point. The internal tree is deliberately small and interface-first so new languages, providers, or caches slot in without touching the build orchestrator.
cmd/codemap/ CLI entry + MCP server internal/ model/ CodeMapEntry, CodeMap types scan/ File system walk with ignore rules hash/ BLAKE3 content hashing parse/ Parser interface + registry langs/golang/ Go AST parser (types, functions, imports) store/ JSON/JSONL cache (atomic writes) llm/ Summariser interface — Anthropic, OpenAI, Google, Mock build/ Incremental build orchestrator (concurrent, rate-limited) autoctx/ LLM-based auto-context file selection render/ Markdown rendering taskfile/ Task file YAML frontmatter parsing context/ Session context injection mcp/ MCP JSON-RPC server + tool handlers config/ .codemap.yaml loading initcmd/ codemap init (interactive setup) doctor/ Cache diagnostics stats/ Usage metrics + exploration tracking
parse.Parser and register it. New provider? Implement llm.Summariser. New output format? Add a package alongside render/. The orchestrator in build/ never needs to change.
Every claim codemap makes is grounded in observed data: cache hits, tool-call logs, git diff. codemap statistics --eval computes precision and recall against what you actually changed.
task Add soft-delete support to invoices. Preserve patterns. Update tests.
codemap build fresh. Claude Code session with codemap MCP registered. No manual context loading.- Claude reads the task and immediately calls
codemap_selectwith the task description. One tool call. - The cheap model scans 2688 summaries (not 2688 files) and returns a ranked list. 381 candidates match the task, 5 are picked:
src/invoices/service.ts,src/invoices/repo.ts,tests/invoices/service.test.ts,src/types/invoice.ts,src/db/soft-delete.ts. - codemap returns full source of those 5 files. 27 KB. 0 grep, 0 glob, 0 read-and-hope.
- Claude starts coding immediately. It reads
src/db/soft-delete.ts, spots the existing pattern, applies it toinvoices/repo.ts, updates the service, and rewrites the tests in the same style. - Seven minutes later, the PR is open. Claude made 0 extra Read calls beyond the selection.
- After merge, run
codemap statistics --eval: precision 100% (5/5 needed), recall 83% (5 of 6 touched — one shared util wasn't in the index), overhead 0%.
src/**, read a handful of likely-looking files, then start over when the first guess was wrong. Context fills with noise before the agent has even touched the real files.
One command registers codemap as an MCP server. Claude Code picks up three tools automatically and a SessionStart hook that prints the index status at the top of every session.
# Register once, globally
claude mcp add codemap -- codemap mcp
| Tool | What it does |
|---|---|
codemap_select | Given a task, returns full source of the most relevant files. The main event. |
codemap_status | Check if the index is fresh, stale, or never built. |
codemap_build | Trigger an incremental rebuild from inside a session. |
codemap_select first. Gets focused context. Starts coding. No exploration phase — because there's nothing left to explore.
| Command | Description |
|---|---|
codemap init | Interactive setup — provider, model, API key, Claude hooks. |
codemap build | Index the repo (incremental, cached). Safe to interrupt. |
codemap render | Render the code map as a single Markdown document. |
codemap select --task task.md | Select files for a task (CLI mode, no MCP needed). |
codemap context | Show what gets injected at Claude Code session start. |
codemap doctor | Diagnose cache health — orphaned entries, stale hashes. |
codemap statistics | Usage metrics, token savings, context window impact. |
codemap statistics --eval | Compare selections to actual git changes — precision/recall. |
| Provider | Model | Rough cost · 2700 files |
|---|---|---|
| Anthropic | claude-haiku-4-5-20251001 | ~$2–3 |
| OpenAI | gpt-4o-mini | ~$1–2 |
gemini-2.0-flash | ~$0.50 | |
| Mock | no LLM — placeholder summaries | free |
Parsers own facts; the LLM owns meaning
Exported types, function names, and imports come from the AST — never from the model. The model only writes the descriptive fields: summary, when_to_use, keywords. This keeps the index deterministic where it can be and lets the model be wrong only about the fuzzy parts.
It also means you can add a new language by writing a parser, not by retraining anything.
Incremental build, atomic cache
Every build run diffs mtime + BLAKE3 against the cache. Unchanged files skip Parse, Summarise, and Render entirely. A 94% cache hit rate on the second build is normal. Writes are atomic so a half-finished build never corrupts the index.
No embeddings, no vector DB
The select step sends all summaries to the model in a single call. At 2700 files × ~80 tokens that's ~200k tokens — well inside Haiku, Flash, and 4o-mini context windows. Adding a vector DB would win sub-second latency and introduce staleness, drift, and a new dependency. The current design trades one slow call for simplicity, and the numbers still beat interactive exploration.
Observed metrics, no counterfactuals
Precision, recall, and overhead are computed from real git diffs and real tool-call logs. There's no "X% savings vs what would've happened" because that number would be marketing. The evaluator can be run any time after a task ships.
MCP first, CLI always
The MCP server is the shortest path to "Claude Code uses this natively". But every MCP tool is also a CLI command — you can script codemap into CI, pre-commit hooks, or your own agent without touching MCP at all.
Does this replace grep and glob for the agent?
For the first pass, yes. For follow-ups it's still useful — codemap is best when the task is broad ("add soft-delete to invoices"). For pinpoint follow-ups ("what does line 42 do") the agent will still read individual files.
What about TypeScript, Python, Rust?
The parser registry is designed for more languages. Go is first-class today via go/ast. TS/Python/Rust parsers are the obvious next additions — they fit the existing interface and don't require rethinking anything else.
How much does a typical build cost?
For 2700 files: ~$0.50 on Gemini Flash, ~$1–2 on gpt-4o-mini, ~$2–3 on Haiku 4.5. That's the first build. After that only changed files re-summarise, so you pay pennies per session.
Does the API key live in the repo?
It's written to .codemap.yaml, which is gitignored by default. If you prefer an env var, set CODEMAP_API_KEY and delete the key from the config file — both are supported.
How does it compare to running the big model directly on the whole repo?
A big model on a 2 MB repo burns tokens, time, and money on files that don't matter. codemap compresses the candidate pool before the expensive model ever sees it. You can absolutely also do the 2 MB approach on a small repo — codemap is for the point where that stops working.
What happens when I rename or move files?
Move = new path = new cache entry; old one is cleaned by codemap doctor. Rename of an exported symbol shows up in the parser pass and the summary gets regenerated. There's no git-rename tracking — the cache is path-keyed by design.
Why not ctags / Sourcegraph / a vector DB?
ctags gives you symbol lookup — which is what an agent already has via Grep and Read. codemap gives the agent task-shaped context: "here are the five files that matter for what you're about to do". That's a different job.
Sourcegraph is a hosted code-search surface. codemap is a local CLI + MCP server that your agent calls natively. No service, no auth, no index hosted somewhere else.
A vector DB would give sub-second select at the cost of a running service, embedding drift, and staleness. With a 200k-token summary bundle, the model does the selection itself in one call — no index to keep fresh, no embeddings to re-compute when the code shifts.