codemap · Repo intelligence for AI agents Go · MCP

Code map your repo. Agents hit the right files on the first try.

codemap indexes every file with a one-line summary, then uses a cheap fast model to pick only the files relevant to a task. Instead of your agent burning ten tool calls exploring the codebase, it gets the exact files it needs in one shot.

2688files indexed
94%cache hit rate
97%context saved
~$0.50cost / 2700 files
01
Why — exploration is a tax, not a feature
thesis

An AI agent opens your repo and the first ten minutes are a tax: grep, glob, read, read, read. It's trying to build a mental model of the code every single time, out of nothing. The context window fills with dead ends. The answer, when it finally arrives, is worse — because half the budget went to the search.

The fix isn't a bigger model. It's giving the agent a map. codemap indexes your repo with a cheap fast model once, then uses another cheap fast model to pick only the files that matter for each task. The expensive model only ever sees focused context.

Higher precision on the input. Higher precision on the output.

The shape. Code map with a cheap model. Auto-context with a cheap model. Code with a big model. Three models, three roles — each doing what it's best at.
02
How it works — build once, select per task
diagram

A build pass over the repo produces an index. A select pass takes a task description, reads the index (not the source), and returns the files that matter. The expensive coding model never sees the 2688-file pile.

YOUR REPO 2688 files Go · TS · Python · any CODEMAP BUILD Per-file index summary · types · fns · imports cached · incremental CODEMAP SELECT Auto-context 381 candidates → 5 files cheap fast model TASK + MCP codemap_select native tool call · 1 shot AGENT Claude Code Cursor · etc codes focused index
Build Select MCP Index
03
The three primitives — what each piece does, quickly
03 / 03
Code map

One line per file, with structure

Every file gets a summary, a when-to-use hint, exported types and functions, imports, and keywords. Deterministic facts come from parsers; semantic fields come from a cheap model.

  • summary
  • when_to_use
  • public_types
  • public_functions
  • imports · keywords
Command
codemap build
Cache
mtime + BLAKE3
Cost
~$0.50 / 2700 files
Auto-context

Pick the files that matter

Given a task, a cheap fast model reads the summaries (not the source) and returns the 5–10 files the big model should see. You get focused context instead of a 2 MB wall.

  • 381 candidates → 5 files
  • 30–80k tokens out
  • task file or MCP call
  • provider-agnostic
Command
codemap select
Input
task.md or query
Output
full source of winners
MCP server

Native tool calls, zero glue

Claude Code registers codemap in one command and picks up three tools. No wrapper, no prompt, no tutorial — the agent just learns to call them.

  • codemap_select
  • codemap_status
  • codemap_build
  • SessionStart hook
Register
claude mcp add
Transport
stdio (local)
Auth
local · none
04
Quick start — install, build, register MCP
~3 min
# install
go install github.com/jonnonz1/codemap/cmd/codemap@latest

# 1. Interactive setup — picks provider, model, API key
cd your-project
codemap init

# 2. Index the repo. First run takes a few minutes; after that it's seconds.
codemap build

# 3. Register the MCP server with Claude Code (once, globally).
claude mcp add codemap -- codemap mcp

# Start a Claude Code session — codemap_select is a native tool.
No API key? Run codemap init and pick mock. You get placeholder summaries but the full workflow — handy for trying the loop before committing to a provider.
05
Build pipeline — six passes, cached
incremental

Build is cheap and cached. Each file goes through six passes; unchanged files skip the expensive ones entirely via mtime + BLAKE3 lookup.

Pass 1
Scan
Walk the tree, honour .gitignore and .codemap.yaml rules.
Pass 2
Hash
BLAKE3 content hash. mtime + hash keys the cache.
Pass 3
Parse
AST per language. Go today via go/ast; registry for more.
Pass 4
Summarise
Cheap fast model writes summary, when-to-use, keywords.
Pass 5
Render
Markdown code map + JSON/JSONL for tooling.
Pass 6
Cache
Atomic writes to .claude/cache/. Safe to interrupt.
The LLM only touches semantic fields. public_types, public_functions, and imports always come from the parser. The model writes summary, when_to_use, and keywords — never code facts.
06
Inside the index — what one file looks like
CodeMapEntry

Each file in the code map is a single JSON object. Deterministic fields on the left come from the parser; semantic fields on the right come from the cheap model. The big coding model never reads this — it only reads what select returns.

// .claude/cache/codemap.jsonl — one line per file
{
  "path": "internal/build/orchestrator.go",
  "hash": "blake3:9f2d1a…",
  "mtime": "2026-04-18T14:22:07Z",

  // --- parser (deterministic) ---
  "public_types":     ["Orchestrator", "BuildResult"],
  "public_functions": ["NewOrchestrator", "Run", "runOne"],
  "imports": [
    "github.com/jonnonz1/codemap/internal/hash",
    "github.com/jonnonz1/codemap/internal/llm",
    "github.com/jonnonz1/codemap/internal/parse"
  ],

  // --- cheap model (semantic) ---
  "summary":     "Concurrent build orchestrator: fans out per-file work, rate-limits LLM calls, writes atomically.",
  "when_to_use": "When changing how the build runs — concurrency, backpressure, resume, or failure handling.",
  "keywords":    ["concurrency", "rate-limit", "atomic", "orchestrator"]
}
Why this split matters. When the model is wrong it's wrong about meaning, not structure. public_functions and imports come from go/ast and are always correct. If a summary drifts after a refactor, it re-summarises on the next build automatically.
07
Architecture — small packages, narrow interfaces
Go internal

codemap is a Go CLI with an MCP server entry point. The internal tree is deliberately small and interface-first so new languages, providers, or caches slot in without touching the build orchestrator.

cmd/codemap/          CLI entry + MCP server
internal/
  model/              CodeMapEntry, CodeMap types
  scan/               File system walk with ignore rules
  hash/               BLAKE3 content hashing
  parse/              Parser interface + registry
  langs/golang/       Go AST parser (types, functions, imports)
  store/              JSON/JSONL cache (atomic writes)
  llm/                Summariser interface — Anthropic, OpenAI, Google, Mock
  build/              Incremental build orchestrator (concurrent, rate-limited)
  autoctx/            LLM-based auto-context file selection
  render/             Markdown rendering
  taskfile/           Task file YAML frontmatter parsing
  context/            Session context injection
  mcp/                MCP JSON-RPC server + tool handlers
  config/             .codemap.yaml loading
  initcmd/            codemap init (interactive setup)
  doctor/             Cache diagnostics
  stats/              Usage metrics + exploration tracking
Extension points. New language? Implement parse.Parser and register it. New provider? Implement llm.Summariser. New output format? Add a package alongside render/. The orchestrator in build/ never needs to change.
08
Measuring it — numbers, not estimates
observed

Every claim codemap makes is grounded in observed data: cache hits, tool-call logs, git diff. codemap statistics --eval computes precision and recall against what you actually changed.

Precision
65%
Of selected files, how many were actually needed.
Recall
82%
Of changed files, how many were pre-selected.
Context saved
97%
How much the candidate pool was compressed.
Overhead
15%
Extra Read calls beyond the selection.
No counterfactuals. There's no fake comparison to "what would've happened without codemap". These metrics are computed from the git diff after the task ships and the tool-call logs Claude already emits.
09
Worked example — add soft-delete to invoices
1 task

task Add soft-delete support to invoices. Preserve patterns. Update tests.

A 2688-file Go + TS repo with codemap build fresh. Claude Code session with codemap MCP registered. No manual context loading.
  1. Claude reads the task and immediately calls codemap_select with the task description. One tool call.
  2. The cheap model scans 2688 summaries (not 2688 files) and returns a ranked list. 381 candidates match the task, 5 are picked: src/invoices/service.ts, src/invoices/repo.ts, tests/invoices/service.test.ts, src/types/invoice.ts, src/db/soft-delete.ts.
  3. codemap returns full source of those 5 files. 27 KB. 0 grep, 0 glob, 0 read-and-hope.
  4. Claude starts coding immediately. It reads src/db/soft-delete.ts, spots the existing pattern, applies it to invoices/repo.ts, updates the service, and rewrites the tests in the same style.
  5. Seven minutes later, the PR is open. Claude made 0 extra Read calls beyond the selection.
  6. After merge, run codemap statistics --eval: precision 100% (5/5 needed), recall 83% (5 of 6 touched — one shared util wasn't in the index), overhead 0%.
The failure mode it fixes. Without codemap, the same task typically opens with 8–12 exploration tool calls — grep for "invoice", glob src/**, read a handful of likely-looking files, then start over when the first guess was wrong. Context fills with noise before the agent has even touched the real files.
10
Claude Code integration — three tools, one registration
MCP

One command registers codemap as an MCP server. Claude Code picks up three tools automatically and a SessionStart hook that prints the index status at the top of every session.

# Register once, globally
claude mcp add codemap -- codemap mcp
ToolWhat it does
codemap_selectGiven a task, returns full source of the most relevant files. The main event.
codemap_statusCheck if the index is fresh, stale, or never built.
codemap_buildTrigger an incremental rebuild from inside a session.
When you give Claude a task, it calls codemap_select first. Gets focused context. Starts coding. No exploration phase — because there's nothing left to explore.
11
Commands — the CLI surface
8
CommandDescription
codemap initInteractive setup — provider, model, API key, Claude hooks.
codemap buildIndex the repo (incremental, cached). Safe to interrupt.
codemap renderRender the code map as a single Markdown document.
codemap select --task task.mdSelect files for a task (CLI mode, no MCP needed).
codemap contextShow what gets injected at Claude Code session start.
codemap doctorDiagnose cache health — orphaned entries, stale hashes.
codemap statisticsUsage metrics, token savings, context window impact.
codemap statistics --evalCompare selections to actual git changes — precision/recall.
12
Providers — pick cheap, pick fast
4
ProviderModelRough cost · 2700 files
Anthropicclaude-haiku-4-5-20251001~$2–3
OpenAIgpt-4o-mini~$1–2
Googlegemini-2.0-flash~$0.50
Mockno LLM — placeholder summariesfree
Pick by speed, not capability. The summariser doesn't need to reason about your architecture — it needs to write 200 good one-liners a minute. Haiku, Flash, and 4o-mini all do that well; the difference is throughput and cost.
13
Key decisions — why the tool is shaped this way
5
Parsers own facts; the LLM owns meaning

Exported types, function names, and imports come from the AST — never from the model. The model only writes the descriptive fields: summary, when_to_use, keywords. This keeps the index deterministic where it can be and lets the model be wrong only about the fuzzy parts.

It also means you can add a new language by writing a parser, not by retraining anything.

Incremental build, atomic cache

Every build run diffs mtime + BLAKE3 against the cache. Unchanged files skip Parse, Summarise, and Render entirely. A 94% cache hit rate on the second build is normal. Writes are atomic so a half-finished build never corrupts the index.

No embeddings, no vector DB

The select step sends all summaries to the model in a single call. At 2700 files × ~80 tokens that's ~200k tokens — well inside Haiku, Flash, and 4o-mini context windows. Adding a vector DB would win sub-second latency and introduce staleness, drift, and a new dependency. The current design trades one slow call for simplicity, and the numbers still beat interactive exploration.

Observed metrics, no counterfactuals

Precision, recall, and overhead are computed from real git diffs and real tool-call logs. There's no "X% savings vs what would've happened" because that number would be marketing. The evaluator can be run any time after a task ships.

MCP first, CLI always

The MCP server is the shortest path to "Claude Code uses this natively". But every MCP tool is also a CLI command — you can script codemap into CI, pre-commit hooks, or your own agent without touching MCP at all.

14
FAQ — questions worth asking
7
Does this replace grep and glob for the agent?

For the first pass, yes. For follow-ups it's still useful — codemap is best when the task is broad ("add soft-delete to invoices"). For pinpoint follow-ups ("what does line 42 do") the agent will still read individual files.

What about TypeScript, Python, Rust?

The parser registry is designed for more languages. Go is first-class today via go/ast. TS/Python/Rust parsers are the obvious next additions — they fit the existing interface and don't require rethinking anything else.

How much does a typical build cost?

For 2700 files: ~$0.50 on Gemini Flash, ~$1–2 on gpt-4o-mini, ~$2–3 on Haiku 4.5. That's the first build. After that only changed files re-summarise, so you pay pennies per session.

Does the API key live in the repo?

It's written to .codemap.yaml, which is gitignored by default. If you prefer an env var, set CODEMAP_API_KEY and delete the key from the config file — both are supported.

How does it compare to running the big model directly on the whole repo?

A big model on a 2 MB repo burns tokens, time, and money on files that don't matter. codemap compresses the candidate pool before the expensive model ever sees it. You can absolutely also do the 2 MB approach on a small repo — codemap is for the point where that stops working.

What happens when I rename or move files?

Move = new path = new cache entry; old one is cleaned by codemap doctor. Rename of an exported symbol shows up in the parser pass and the summary gets regenerated. There's no git-rename tracking — the cache is path-keyed by design.

Why not ctags / Sourcegraph / a vector DB?

ctags gives you symbol lookup — which is what an agent already has via Grep and Read. codemap gives the agent task-shaped context: "here are the five files that matter for what you're about to do". That's a different job.

Sourcegraph is a hosted code-search surface. codemap is a local CLI + MCP server that your agent calls natively. No service, no auth, no index hosted somewhere else.

A vector DB would give sub-second select at the cost of a running service, embedding drift, and staleness. With a 200k-token summary bundle, the model does the selection itself in one call — no index to keep fresh, no embeddings to re-compute when the code shifts.