Series This is Part 2 of a short series on Claude Code and monorepos. Part 1 covered the four-layer extension model (CLAUDE.md, rules, skills, MCP) and session discipline—start there if you have not read it: Making repositories more AI-proficient with Claude Code.

Context bloat is the silent killer of AI productivity. Every token in CLAUDE.md that the model never uses for your task is a token stolen from code, diffs, and corrections. The art of AI-proficient repos is knowing what to write—and what not to write.


Opening: the token trap

Picture a well-meaning developer who ships a “complete” CLAUDE.md: 2,800 lines of tech stack, folder trees, API prose, schema dumps, testing guides, deploy runbooks, and troubleshooting. The file loads every session. Roughly ~2,100 tokens sit in context before anyone types a prompt.

Now the task is trivial: fix a typo in README.md. Maybe ~300 tokens of that slab were ever relevant. The rest is ambient noise: correct but unused, re-read every time, competing with the actual file you care about.

If you ballpark input pricing on the order of $0.03 per 1K tokens (rates change by plan and model—treat numbers as illustrative, not a quote), then ~1,800 wasted tokens per session is on the order of $0.05+ thrown away before the first tool call. Multiply by a busy team and the line item stops looking hypothetical. Jpranav’s writeup walks through a similar restructuring story and reports on the order of ~62% token reduction using a tiered documentation approach—not magic, just moving detail out of the always-on path.

Baseline vs lean always-on (illustrative):

Bloated Tier 1Lean Tier 1
Tokens per session start~2,100~800
“Saved” per session~1,300
At ~$0.03 / 1K input (illustrative)~$0.039 saved per session on context tax alone

Stack that over twenty sessions a week and you are in the tens of dollars per developer per month of avoidable input tax before anyone writes a function. At team scale it is the kind of line item finance asks about once someone reads the invoice closely. The fix is rarely “use the model less”—it is load less irrelevant text at the start of every thread.

The point is not precision to the penny—it is directional: context is not free. It is the working set for reasoning. Anthropic’s cost guidance is the canonical place to align on how billing, /cost, and model choice interact in Claude Code. ClaudeLog’s token FAQ rounds out practical habits—when to avoid stuffing MCP output, how to sanity-check what got pulled in, and why “more context” is not a universal good.

Quick wins this week: (1) Move anything longer than a screenful out of CLAUDE.md into docs/ and link it. (2) Delete prose that duplicates package.json scripts or the tree you can see in the repo. (3) Replace abstract rules with one concrete JSON or code example plus a pointer to the real implementation file.


Context efficiency 101: the three tiers

Tier 1 — always-on (critical path)

Lives in root CLAUDE.md (and stays short).

Belongs here: a 1–2 sentence project summary; stack with versions; a compact folder map (names + purpose, not every file); hard constraints (“do not touch,” PCI zones, migration policy); the few commands people run daily; one canonical pattern example (named exports, error shape, whatever matters most).

Does not belong here: full API reference, schema dumps, deploy novels, troubleshooting archives—link to docs/API.md, docs/DATABASE.md, etc. If Claude can infer it by opening two files, do not narrate the whole tree.

Token budget: aim for hundreds of lines, not thousands. Claude Code docs on cost and context reinforce the same idea: keep the always-on slice honest.

Example Tier 1 (composite “Acme Platform,” not a literal employer repo):

# Acme Platform

## Summary
Turborepo: Next.js web app, Hono API workers, shared packages, Postgres + Prisma.

## Stack
- Web: Next.js 14, TypeScript, Tailwind
- API: Hono on Workers
- Data: PostgreSQL (Neon), Prisma

## Layout
- apps/web — UI
- apps/api — HTTP + workers
- packages/db — schema + migrations
- packages/shared — types, utils, UI kit

## Do not touch
- apps/api/auth/** (reviewed quarterly)
- prod Terraform modules (change window)
- SQL migrations (DBA signoff)

## Commands
pnpm dev · pnpm test --run · pnpm typecheck · pnpm lint

## Pattern
Named exports only — see packages/ui/Button for canonical layout.
Full conventions: docs/PATTERNS.md

Tier 2 — on-demand (task-triggered)

Lives in docs/*.md (and similar). Loads when the human asks—e.g. “read docs/API.md before changing routes,” /add docs/DATABASE.md, or a small skill that pulls the right file.

Each file can be long; it should not live in Tier 1. ClaudeLog’s FAQ nudges toward the same habit: inspect what you load, avoid fat MCP payloads, and prefer explicit file pulls over “dump everything by default.”

Prompt hygiene matters. Compare “implement notifications” (vague, invites guessing) with “implement notifications; before edits, read docs/API.md and docs/DATABASE.md; follow existing websocket utilities under packages/shared.” The second prompt is longer in characters but often cheaper in tokens over the whole session because it prevents blind exploration of half the repo. Claude Fast’s usage optimization guide makes a similar case: batch coherent edits, avoid thrashy tool loops, and treat prompts like API calls—parameters should route work, not narrate your autobiography.

Index file: maintain docs/INDEX.md as a menu of Tier 2 surfaces. Humans skim it during onboarding; agents use it when you say “pick the right doc for migrations.” Keep blurbs to one line each so the index itself stays Tier 2–sized, not a second Tier 1.

Sketch of a docs/ tree:

docs/
  INDEX.md
  API.md
  DATABASE.md
  TESTING.md
  DEPLOYMENT.md
  PATTERNS.md
  ARCHITECTURE.md
  troubleshooting/

Tier 3 — reference-only

Deep dives, incident writeups, roadmaps, legacy histories. Zero proactive token cost until someone asks. Perfect home for “we fixed this once in 2024” material.

Why three tiers beats one big file: Jpranav’s Medium piece frames the win as recovering headroom—roughly ~1,300 tokens per session in their restructuring story once always-on context shrank and depth moved behind explicit loads. You get faster first responses (less text to ingest before tool use), higher relevance (the model sees constraints and layout, not your entire API catalog), and room left in the window for the actual diff, test output, and follow-up questions. Richard Porter’s token-management post makes the same case from a habits angle: small, repeatable choices compound into material savings on paid plans.

Coherence, not just cash: when irrelevant prose sits in Tier 1, the model still has to route attention across it. Lean always-on text reduces ambiguous “hints” that conflict with the code you are about to open. That is harder to spreadsheet than dollars, but it shows up as fewer wrong-pattern PRs and less “sorry, I missed that you use named exports.”

Tier 1 — Always-on (CLAUDE.md)~250–800 tokens · every sessionTier 2 — On-demand (docs/*.md)+500–1,500 tokens · when task needs depthTier 3 — Reference (archives, roadmaps, old incidents)0 tokens until explicitly requested

The deduplication framework: what not to write

Rule 1 — do not document what the tree already proves

Anti-pattern (noisy Tier 1):

## Component structure
All components live under packages/ui/components/.
Each has Component.tsx, Component.test.tsx, index.ts.
Examples: Button → packages/ui/components/Button/...
Modal → packages/ui/components/Modal/...
Input → packages/ui/components/Input/...

Better: one line plus a canonical path; full detail in docs/PATTERNS.md.

## Components
packages/ui/components/{Name}/ — Name.tsx, Name.test.tsx, index.ts (named exports).
Canonical example: packages/ui/components/Button/. Full guide: docs/PATTERNS.md

Listing every component path when one line plus “open Button/” suffices burns tokens without raising success rate. Push enumerations to Tier 2 or delete them.

Rule 2 — do not mirror skills and rules inside CLAUDE.md

Anti-pattern: pasting lint philosophy, line-length rules, and “never console.log” essays into CLAUDE.md when .claude/rules/lint.md or a /lint skill already exists.

Better in Tier 1:

## Linting
Run `pnpm lint` before commit. Details: docs/LINT_CONFIG.md; use `/lint` when refactoring style.

If /lint or .claude/rules/lint.md exists, do not paste five hundred lines of ESLint philosophy into Tier 1. Link once; keep the mechanical reminder in the skill. Claude Fast’s usage notes stress skill-first workflows and batching edits to avoid churn—duplicated prose fights both.

Rule 3 — prefer one example over five bullet abstractions

Anti-pattern: paragraphs describing error JSON fields (“there is an error string, a code, a status, optional details…”). Better: one real JSON object and a path to the handler source.

## Errors
JSON shape:
{"error":"User not found","code":"USER_NOT_FOUND","status":404,"timestamp":"2025-03-29T10:30:00Z"}
Implementation: apps/api/src/handlers/errors.ts

Rule 4 — trim script catalogs

Anti-pattern: fifteen pnpm scripts transcribed into CLAUDE.md. Better: the three that gate CI or onboarding, then “full list: pnpm run or package.json.”

pnpm run and package.json are authoritative. Tier 1 should list the three commands that gate CI, not every script ever added.

Rule 5 — single source of truth for constraints

If “do not touch” lists appear in CLAUDE.md, a rule file, and docs/CONSTRAINTS.md, you will eventually fork reality. Pick one canonical doc; everywhere else links.

Rule 6 — watch MCP and attachments as “hidden Tier 1”

Every MCP response you append is borrowed from the same window as your code. Pierre-Emmanuel Féga’s analysis of token efficiency in Claude workflows highlights how MCP payloads and bulky tool returns can dominate context if you are not deliberate. Treat connectors like optional compilers: turn them on when the task needs external truth, not because they look impressive in a demo config.

Side-by-side prompts (efficient vs wasteful)

Wasteful: “Read everything under docs/ and then implement the feature.” You have turned Tier 2 into a second always-on layer.

Efficient: “Tier 1 already describes layout. For this task, read only docs/API.md and packages/shared/errors.ts, then propose a plan before edits.” You have fixed scope and cost at once—johnlindquist’s gist is essentially that discipline applied to real agent configuration: pull dynamically, delete redundancy, measure again.

BeforeCLAUDE.md ~2,800 lines~2,100T always-onTypo fix: most tokens irrelevantHigh waste · slow startAfterCLAUDE.md ~400 lines~300–800T Tier 1+ Tier 2 only when neededLower tax · faster first response

Architecture in action: one session, composite

Scenario: “Add real-time notifications to the dashboard” on a stack like Part 1’s Acme example—illustrative, not a literal map of any one company.

Start: Tier 1 loads (~250–400 tokens): stack, layout, forbidden zones, commands, one pattern pointer. Think of this as bootstrap memory—enough to choose files, not enough to implement the whole feature.

Prompt (good shape): the human names events (agent completed, production error, mention), says WebSocket, points at user settings for preferences, and explicitly adds Tier 2: “Read docs/API.md and docs/DATABASE.md before edits.” That line is the difference between targeted context and spray and pray.

Tier 2 on purpose: docs/API.md might add ~600 tokens (status codes, error envelope, versioning). docs/DATABASE.md might add ~500 tokens (tables, migration rules, ORM conventions). Together they stay off the critical path for “fix typo in README” but on for this feature.

Tooling and code reads: the agent still opens packages/shared websocket helpers and similar—those tokens are earned because they are directly used in the implementation. Total “planned” context for this session might land around ~1.3–1.7K before conversation and diffs, versus ~2K+ always-on in the mega-CLAUDE.md world where most of the slab never applied.

Execution: sequence schema → API → UI hook → tests; checkpoint commits between phases so a compaction or handoff does not lose the thread. Richard Porter’s notes and dholdaway’s workflow gist both emphasize continuity habits—small commits, re-grounding after compaction, and not treating the model like a stateless compiler.

Contrast: a single mega-file would have front-loaded thousands of tokens unrelated to websockets before the first edit—deploy runbooks, old incident prose, testing matrices. That is the trap: you pay full price every session for partial relevance. johnlindquist’s gist is the receipts: dynamic loading plus deduped skill text moved their configuration from 7,584 → 3,434 tokens in one documented pass—a ~54% drop.

MeasureFind dupesConsolidateRe-measureOne source of truth · link everywhere else

The deduplication audit

Step 1 — baseline: approximate words → tokens (wc -w and a conservative divisor like 0.75 words/token is fine for planning). Inventory docs/** and .claude/** the same way.

# Rough word counts (planning—not billing-grade token counting)
wc -w CLAUDE.md
find docs -name '*.md' -print0 | xargs -0 wc -w | sort -n

# Hunt duplicate themes (examples—adjust phrases to your repo)
rg -n "error handling|named export|migration|PCI|Do not touch" CLAUDE.md docs .claude || true

Step 2 — hunt duplicates: search repeated phrases (“error handling,” “named exports,” “migration”) across Tier 1, docs/, and rules. If three places say the same thing, two are lying.

Step 3 — consolidate: pick a canonical file, link outward, delete shadows.

Step 4 — verify behavior: run three representative tasks in Claude Code—tiny (typo), medium (one endpoint), large (cross-package feature). If Tier 1 no longer answers “where do I start?”, you cut too deep; restore one map or example. If sessions still feel soggy, you probably have silent duplication in skills or MCP defaults.

johnlindquist’s optimization gist documents a ~54% context reduction in one real configuration by removing redundant skill documentation and leaning on dynamic loading—concrete proof that audits pay off.

Pierre-Emmanuel Féga’s piece adds another lens: when MCP and skills pile on, filter aggressively—telemetry and attachments can dwarf the code you meant to edit.

Anti-pattern: measuring once, declaring victory, and letting CLAUDE.md grow again for six months. Context creep is gradual; bake a quarterly pass into your team calendar.


Implementation playbook (five weeks)

WeekFocusDone when
1Audit and mapSpreadsheet: what lives where, rough token estimates
2Shrink Tier 1CLAUDE.md under ~500 lines; links replace prose
3Build docs/INDEX + API/DATABASE/PATTERNS/TESTING at minimum
4Skills and rulesFew, sharp triggers—no encyclopedia in .claude/
5Validate/cost or team feedback; adjust Tier placement

Week 1 — audit and plan: measure CLAUDE.md and each docs/*.md with wc -w; list every rule and skill under .claude/ with one sentence of purpose. Grepping for repeated phrases (see audit section) usually surfaces API errors, migration policy, and test commands duplicated three ways. Deliverable: a table with columns location, topic, rough tokens, keep / merge / delete.

Week 2 — refactor Tier 1: cut narrative that belongs in Tier 2; keep summary, stack, layout, do-not-touch, daily commands, one pattern. Target under ~500 lines; ~600 is acceptable if every extra line is a compliance or safety constraint you cannot afford to omit—not a tutorial.

Week 3 — Tier 2 structure: create docs/INDEX.md with one-line blurbs; move API, database, testing, deployment, and architecture depth out of CLAUDE.md. Add docs/troubleshooting/ for long tail. Cross-link Tier 2 files so humans (and agents told “read INDEX first”) do not get lost.

Week 4 — skills and rules: merge overlapping skills; prefer triggers (“when editing routes, open API.md”) over pasting API.md into the skill. Tier 1 should name skills, not reimplement them.

Week 5 — validate: three to five developers run real tasks; watch /cost if available; ask whether anything felt under-specified. If the answer is “we had to paste Confluence links every time,” that topic belongs in Tier 2, not Slack folklore.

Use /context and product guidance when your tool exposes it—Claude Fast’s context mechanics guide ties window usage, compaction, and failure modes together in one place. Anthropic’s cost docs remain the source of truth for billing, /cost, and thinking about model tier as a knob separate from documentation tier.

Context (illustrative)Tier 1Tier 2 on-demandConversation + codeGreen zone ends where guessing starts—watch growth weekly.
W1 AuditW2 Tier 1 cutW3–W5 docs · skills · validate

Monitoring and iteration

Quarterly: re-run word counts; ask which docs/ files people actually opened; archive zombies.

Red flags: CLAUDE.md drifting past ~800 lines; session starts feeling “heavy”; engineers saying half the preloaded text never mattered.

Optional automation: CI soft check on CLAUDE.md line count; a script that lists docs/*.md sizes for review—keep it advisory, not bureaucratic.

Steve Kinney’s course material on cost management is a sane onboarding link for model choice (when to reach for heavier models), compaction, and observing spend as a habit—not a one-off lecture.

What to log in retro (lightweight): task type, whether Tier 2 files were explicitly named, whether you hit /compact or a context warning, and “did the model re-read the same doc twice?” If the answer is often yes, your prompts or skills are re-fetching instead of referencing—another dedup pass. dholdaway’s workflow gist is useful here for continuity ideas: small checkpoints, predictable file order, and keeping CLAUDE.md diffs reviewable like any other config change.

When windows get tight, Claude Fast on context mechanics explains why conversations fail in ways that look like “the model got dumb”—often it is eviction, not IQ. Monitoring is partly social: teach the team to recognize that pattern so they do not “fix” it by pasting even more text into the thread.

BeforeAfterCLAUDE.md (everything)API · DB · deploy · tests · …apps/ · packages/code onlyCLAUDE.md (Tier 1, short)docs/API.md · DATABASE.md …docs/troubleshooting/ ….claude/ rules · skillsTier 2+3 separated · links from Tier 1

Appendix: quick reference tables

Token budgets by task type (illustrative planning numbers—not measurements from your repo):

TaskTier 1Tier 2 addConversation (illustr.)Total (illustr.)
Typo / copy300T0100T~400T
API endpoint300T600T800T~1.7K
Migration300T800T1.2K~2.3K
Full feature300T1.5K3K~4.8K

Duplicates to consolidate

PatternTypical duplicate locationsFix
Error JSON shapeCLAUDE + API doc + skillCanonical in docs/API.md
Component exportsCLAUDE + PATTERNSExample in Tier 1; depth in docs/PATTERNS.md
TestsCLAUDE + READMEdocs/TESTING.md

Red flags and remediation

Red flagLikely causeFix
Session start feels slow or “heavy”Tier 1 too largeCut CLAUDE.md; move depth to docs/
Baseline context ~2K+ tokens before chattingBloated rules/skills or MCP noiseDedup; disable unused MCP; shorten skills (Féga, Sources list)
“Half the loaded doc was useless”Wrong tier—material should be Tier 3 or archiveDemote or split files
Costs creep month over monthContext drift, no reviewQuarterly audit + soft CI line-count nudge

Closing: context as leverage

Teams that win with Claude Code are not the ones with the most documentation—they ship the right-sized documentation. Deduplication and tiers are how you keep the window for code, not wallpaper.

Treat context like memory in a tight loop: you would not allocate a megabyte to hold a byte you read once. The same instinct applies to CLAUDE.md. Onboarding gets easier, not harder, when Tier 1 is a map and Tier 2 is the encyclopedia—humans skim the short file first; agents pay tokens only for the chapter that matches the task.

Next step: spend one hour listing what your Tier 1 repeats from Tier 2 or the repo. Delete or link the repeats. Then read Part 1 if you have not aligned your extension layers yet—the tiers only work when skills, rules, and MCP are not fighting the anchor file.


Sources

Accessed March 31, 2026.

TL;DR Treat CLAUDE.md as a small always-on cache. Push depth to Tier 2 (docs/) and Tier 3 (archives). Deduplicate anything that repeats skills, the filesystem, or package.json. Run a five-week retrofit: audit, shrink Tier 1, build docs/, tighten skills, validate with /cost and team feedback. Part 1 covers the four layers; this part is how you keep them cheap to load.