codemap · Repo intelligence for AI agents Go · MCP

Code map your repo. Agents hit the right files on the first try.

codemap indexes every file with a one-line summary, then uses a cheap fast model to pick only the files relevant to a task. Instead of your agent burning ten tool calls exploring the codebase, it gets the exact files it needs in one shot.

quick-start github metrics jonno.nz

2688files indexed

94%cache hit rate

97%context saved

~$0.50cost / 2700 files

Why — exploration is a tax, not a feature

thesis

An AI agent opens your repo and the first ten minutes are a tax: grep, glob, read, read, read. It's trying to build a mental model of the code every single time, out of nothing. The context window fills with dead ends. The answer, when it finally arrives, is worse — because half the budget went to the search.

The fix isn't a bigger model. It's giving the agent a map. codemap indexes your repo with a cheap fast model once, then uses another cheap fast model to pick only the files that matter for each task. The expensive model only ever sees focused context.

Higher precision on the input. Higher precision on the output.

The shape. Code map with a cheap model. Auto-context with a cheap model. Code with a big model. Three models, three roles — each doing what it's best at.

How it works — build once, select per task

diagram

A build pass over the repo produces an index. A select pass takes a task description, reads the index (not the source), and returns the files that matter. The expensive coding model never sees the 2688-file pile.

Build Select MCP Index

The three primitives — what each piece does, quickly

03 / 03

Code map

One line per file, with structure

Every file gets a summary, a when-to-use hint, exported types and functions, imports, and keywords. Deterministic facts come from parsers; semantic fields come from a cheap model.

summary
when_to_use
public_types
public_functions
imports · keywords

Command: codemap build
Cache: mtime + BLAKE3
Cost: ~$0.50 / 2700 files

Auto-context

Pick the files that matter

Given a task, a cheap fast model reads the summaries (not the source) and returns the 5–10 files the big model should see. You get focused context instead of a 2 MB wall.

381 candidates → 5 files
30–80k tokens out
task file or MCP call
provider-agnostic

Command: codemap select
Input: task.md or query
Output: full source of winners

MCP server

Native tool calls, zero glue

Claude Code registers codemap in one command and picks up three tools. No wrapper, no prompt, no tutorial — the agent just learns to call them.

codemap_select
codemap_status
codemap_build
SessionStart hook

Register: claude mcp add
Transport: stdio (local)
Auth: local · none

Quick start — install, build, register MCP

~3 min

# install
go install github.com/jonnonz1/codemap/cmd/codemap@latest

# 1. Interactive setup — picks provider, model, API key
cd your-project
codemap init

# 2. Index the repo. First run takes a few minutes; after that it's seconds.
codemap build

# 3. Register the MCP server with Claude Code (once, globally).
claude mcp add codemap -- codemap mcp

# Start a Claude Code session — codemap_select is a native tool.

No API key? Run codemap init and pick mock. You get placeholder summaries but the full workflow — handy for trying the loop before committing to a provider.

Build pipeline — six passes, cached

incremental

Build is cheap and cached. Each file goes through six passes; unchanged files skip the expensive ones entirely via mtime + BLAKE3 lookup.

Pass 1

Scan

Walk the tree, honour .gitignore and .codemap.yaml rules.

Pass 2

Hash

BLAKE3 content hash. mtime + hash keys the cache.

Pass 3

Parse

AST per language. Go today via go/ast; registry for more.

Pass 4

Summarise

Cheap fast model writes summary, when-to-use, keywords.

Pass 5

Render

Markdown code map + JSON/JSONL for tooling.

Pass 6

Cache

Atomic writes to .claude/cache/. Safe to interrupt.

The LLM only touches semantic fields. public_types, public_functions, and imports always come from the parser. The model writes summary, when_to_use, and keywords — never code facts.

Inside the index — what one file looks like

CodeMapEntry

Each file in the code map is a single JSON object. Deterministic fields on the left come from the parser; semantic fields on the right come from the cheap model. The big coding model never reads this — it only reads what select returns.

// .claude/cache/codemap.jsonl — one line per file
{
  "path": "internal/build/orchestrator.go",
  "hash": "blake3:9f2d1a…",
  "mtime": "2026-04-18T14:22:07Z",

  // --- parser (deterministic) ---
  "public_types":     ["Orchestrator", "BuildResult"],
  "public_functions": ["NewOrchestrator", "Run", "runOne"],
  "imports": [
    "github.com/jonnonz1/codemap/internal/hash",
    "github.com/jonnonz1/codemap/internal/llm",
    "github.com/jonnonz1/codemap/internal/parse"
  ],

  // --- cheap model (semantic) ---
  "summary":     "Concurrent build orchestrator: fans out per-file work, rate-limits LLM calls, writes atomically.",
  "when_to_use": "When changing how the build runs — concurrency, backpressure, resume, or failure handling.",
  "keywords":    ["concurrency", "rate-limit", "atomic", "orchestrator"]
}

Why this split matters. When the model is wrong it's wrong about meaning, not structure. public_functions and imports come from go/ast and are always correct. If a summary drifts after a refactor, it re-summarises on the next build automatically.

Architecture — small packages, narrow interfaces

Go internal

codemap is a Go CLI with an MCP server entry point. The internal tree is deliberately small and interface-first so new languages, providers, or caches slot in without touching the build orchestrator.

cmd/codemap/          CLI entry + MCP server
internal/
  model/              CodeMapEntry, CodeMap types
  scan/               File system walk with ignore rules
  hash/               BLAKE3 content hashing
  parse/              Parser interface + registry
  langs/golang/       Go AST parser (types, functions, imports)
  store/              JSON/JSONL cache (atomic writes)
  llm/                Summariser interface — Anthropic, OpenAI, Google, Mock
  build/              Incremental build orchestrator (concurrent, rate-limited)
  autoctx/            LLM-based auto-context file selection
  render/             Markdown rendering
  taskfile/           Task file YAML frontmatter parsing
  context/            Session context injection
  mcp/                MCP JSON-RPC server + tool handlers
  config/             .codemap.yaml loading
  initcmd/            codemap init (interactive setup)
  doctor/             Cache diagnostics
  stats/              Usage metrics + exploration tracking

Extension points. New language? Implement parse.Parser and register it. New provider? Implement llm.Summariser. New output format? Add a package alongside render/. The orchestrator in build/ never needs to change.

Measuring it — numbers, not estimates

observed

Every claim codemap makes is grounded in observed data: cache hits, tool-call logs, git diff. codemap statistics --eval computes precision and recall against what you actually changed.

Precision

65%

Of selected files, how many were actually needed.

Recall

82%

Of changed files, how many were pre-selected.

Context saved

97%

How much the candidate pool was compressed.

Overhead

15%

Extra Read calls beyond the selection.

No counterfactuals. There's no fake comparison to "what would've happened without codemap". These metrics are computed from the git diff after the task ships and the tool-call logs Claude already emits.

Worked example — add soft-delete to invoices

1 task

task Add soft-delete support to invoices. Preserve patterns. Update tests.

A 2688-file Go + TS repo with codemap build fresh. Claude Code session with codemap MCP registered. No manual context loading.

Claude reads the task and immediately calls codemap_select with the task description. One tool call.
The cheap model scans 2688 summaries (not 2688 files) and returns a ranked list. 381 candidates match the task, 5 are picked: src/invoices/service.ts, src/invoices/repo.ts, tests/invoices/service.test.ts, src/types/invoice.ts, src/db/soft-delete.ts.
codemap returns full source of those 5 files. 27 KB. 0 grep, 0 glob, 0 read-and-hope.
Claude starts coding immediately. It reads src/db/soft-delete.ts, spots the existing pattern, applies it to invoices/repo.ts, updates the service, and rewrites the tests in the same style.
Seven minutes later, the PR is open. Claude made 0 extra Read calls beyond the selection.
After merge, run codemap statistics --eval: precision 100% (5/5 needed), recall 83% (5 of 6 touched — one shared util wasn't in the index), overhead 0%.

The failure mode it fixes. Without codemap, the same task typically opens with 8–12 exploration tool calls — grep for "invoice", glob src/**, read a handful of likely-looking files, then start over when the first guess was wrong. Context fills with noise before the agent has even touched the real files.

Claude Code integration — three tools, one registration

MCP

One command registers codemap as an MCP server. Claude Code picks up three tools automatically and a SessionStart hook that prints the index status at the top of every session.

# Register once, globally
claude mcp add codemap -- codemap mcp

Tool	What it does
`codemap_select`	Given a task, returns full source of the most relevant files. The main event.
`codemap_status`	Check if the index is fresh, stale, or never built.
`codemap_build`	Trigger an incremental rebuild from inside a session.

When you give Claude a task, it calls codemap_select first. Gets focused context. Starts coding. No exploration phase — because there's nothing left to explore.

Commands — the CLI surface

Command	Description
`codemap init`	Interactive setup — provider, model, API key, Claude hooks.
`codemap build`	Index the repo (incremental, cached). Safe to interrupt.
`codemap render`	Render the code map as a single Markdown document.
`codemap select --task task.md`	Select files for a task (CLI mode, no MCP needed).
`codemap context`	Show what gets injected at Claude Code session start.
`codemap doctor`	Diagnose cache health — orphaned entries, stale hashes.
`codemap statistics`	Usage metrics, token savings, context window impact.
`codemap statistics --eval`	Compare selections to actual git changes — precision/recall.

Providers — pick cheap, pick fast

Provider	Model	Rough cost · 2700 files
Anthropic	`claude-haiku-4-5-20251001`	~$2–3
OpenAI	`gpt-4o-mini`	~$1–2
Google	`gemini-2.0-flash`	~$0.50
Mock	no LLM — placeholder summaries	free

Pick by speed, not capability. The summariser doesn't need to reason about your architecture — it needs to write 200 good one-liners a minute. Haiku, Flash, and 4o-mini all do that well; the difference is throughput and cost.

Key decisions — why the tool is shaped this way

Parsers own facts; the LLM owns meaning

Exported types, function names, and imports come from the AST — never from the model. The model only writes the descriptive fields: summary, when_to_use, keywords. This keeps the index deterministic where it can be and lets the model be wrong only about the fuzzy parts.

It also means you can add a new language by writing a parser, not by retraining anything.

Incremental build, atomic cache

Every build run diffs mtime + BLAKE3 against the cache. Unchanged files skip Parse, Summarise, and Render entirely. A 94% cache hit rate on the second build is normal. Writes are atomic so a half-finished build never corrupts the index.

No embeddings, no vector DB

The select step sends all summaries to the model in a single call. At 2700 files × ~80 tokens that's ~200k tokens — well inside Haiku, Flash, and 4o-mini context windows. Adding a vector DB would win sub-second latency and introduce staleness, drift, and a new dependency. The current design trades one slow call for simplicity, and the numbers still beat interactive exploration.

Observed metrics, no counterfactuals

Precision, recall, and overhead are computed from real git diffs and real tool-call logs. There's no "X% savings vs what would've happened" because that number would be marketing. The evaluator can be run any time after a task ships.

MCP first, CLI always

The MCP server is the shortest path to "Claude Code uses this natively". But every MCP tool is also a CLI command — you can script codemap into CI, pre-commit hooks, or your own agent without touching MCP at all.

FAQ — questions worth asking

Does this replace grep and glob for the agent?

For the first pass, yes. For follow-ups it's still useful — codemap is best when the task is broad ("add soft-delete to invoices"). For pinpoint follow-ups ("what does line 42 do") the agent will still read individual files.

What about TypeScript, Python, Rust?

The parser registry is designed for more languages. Go is first-class today via go/ast. TS/Python/Rust parsers are the obvious next additions — they fit the existing interface and don't require rethinking anything else.

How much does a typical build cost?

For 2700 files: ~$0.50 on Gemini Flash, ~$1–2 on gpt-4o-mini, ~$2–3 on Haiku 4.5. That's the first build. After that only changed files re-summarise, so you pay pennies per session.

Does the API key live in the repo?

It's written to .codemap.yaml, which is gitignored by default. If you prefer an env var, set CODEMAP_API_KEY and delete the key from the config file — both are supported.

How does it compare to running the big model directly on the whole repo?

A big model on a 2 MB repo burns tokens, time, and money on files that don't matter. codemap compresses the candidate pool before the expensive model ever sees it. You can absolutely also do the 2 MB approach on a small repo — codemap is for the point where that stops working.

What happens when I rename or move files?

Move = new path = new cache entry; old one is cleaned by codemap doctor. Rename of an exported symbol shows up in the parser pass and the summary gets regenerated. There's no git-rename tracking — the cache is path-keyed by design.

Why not ctags / Sourcegraph / a vector DB?

ctags gives you symbol lookup — which is what an agent already has via Grep and Read. codemap gives the agent task-shaped context: "here are the five files that matter for what you're about to do". That's a different job.

Sourcegraph is a hosted code-search surface. codemap is a local CLI + MCP server that your agent calls natively. No service, no auth, no index hosted somewhere else.

A vector DB would give sub-second select at the cost of a running service, embedding drift, and staleness. With a 200k-token summary bundle, the model does the selection itself in one call — no index to keep fresh, no embeddings to re-compute when the code shifts.