>_

Claude Code Context Window: Stop Wasting Tokens [2026]

Robin||10 min
Last updated: April 13, 2026
contextclaude-codeoptimizationtoken
Claude Code Context Window: Stop Wasting Tokens [2026]

Claude Code Context Window: What You're Actually Paying For

Claude Code's context window holds everything Claude knows about your session - your instructions, the files it reads, its own responses, and content that never appears in your terminal. Most developers don't realize how much loads before they type a single word.

Understanding what fills the context window and how to control it is the difference between sessions that stay sharp for hours and sessions that degrade after three tasks.

What Loads Before You Type Anything

This is what enters context at session start - before your first message:

  1. System prompt - Claude Code's core instructions
  2. CLAUDE.md files - your project instructions, loaded from the directory hierarchy
  3. Auto memory - first 200 lines of MEMORY.md (or 25KB, whichever comes first)
  4. MCP tool names - lightweight stubs for all configured MCP servers
  5. Skill descriptions - metadata for available skills

How This Adds Up

Each layer consumes tokens from your context budget:

ComponentTypical SizeYou Control It?
System promptFixedNo
CLAUDE.md (project root)100-500 linesYes - keep under 200 lines
Rules files (unscoped)Per fileYes - scope with paths:
Auto memoryUp to 200 linesPartially - Claude writes it
MCP tool stubs~50-100 tokens per serverYes - manage server count
Skill descriptions~50 tokens eachYes - manage skill count

A minimal setup might consume 2,000-3,000 tokens at startup. A setup with 40+ rules files, 7 MCP servers, and 30 skills could easily consume 15,000+ tokens before you say hello.

The Claude Code Context Budget Problem

Context windows are large - up to 1M tokens on Opus and Sonnet 4.6. But that doesn't mean context is free.

Why Context Efficiency Still Matters

Even with 1M tokens, three factors make context management critical:

Cost. Cached tokens are cheap (90% discount via prompt caching), but uncached tokens are full price. Every token in the dynamic portion of your context costs real money per request. A session that loads 30K tokens of rules vs 3K tokens of rules pays 10x more per message for the rules portion.

Quality. Claude's attention is not uniform across the context window. Anthropic's best practices acknowledge that performance can degrade as context fills. More relevant context beats more total context.

Session length. The more you front-load, the fewer tasks you can complete before hitting limits. If you start at 15% context consumed, you have less runway than starting at 3%.

The Real Problem: Always-On Loading

The biggest source of context waste is loading everything at session start regardless of what you're doing. If you have 40 rules files and all are unscoped, they all load at startup. When you're debugging a CSS issue, you're paying for your API design guidelines, your security policy, and your deployment checklist to sit in context doing nothing.

Claude Code Context Management: Native Tools

Claude Code provides several built-in mechanisms for context management. Use these before building custom solutions.

Path-Scoped Rules

The most impactful native feature for context efficiency. Path-scoped rules only load when Claude reads files matching a glob pattern:

code
---
paths:
  - "src/api/**/*.ts"
---

# API Development Rules

- All endpoints must include input validation
- Use standard error response format

This rule only enters context when Claude reads TypeScript files under src/api/. Working on frontend? This rule never loads. Zero cost.

Convert your "sometimes needed" rules from unscoped (always loaded) to path-scoped (loaded on demand). This single change can cut startup context by 50%+ for projects with many rules.

/compact - Summarize Without Losing Instructions

When context fills up mid-session, compaction summarizes conversation history while preserving key content. Use /compact to trigger it manually, or let Claude Code auto-compact when approaching limits.

Critical to understand: compaction summarizes your conversation, not your instructions. Project-root CLAUDE.md and unscoped rules survive compaction and are re-injected from disk. Path-scoped rules and nested CLAUDE.md files are lost until their trigger files are read again.

/clear - Full Reset

Use /clear between unrelated tasks to start fresh. This destroys the entire context including the prompt cache - use it deliberately, not reflexively.

/context - See Your Usage

Run /context for a live breakdown of context usage by category. This shows you exactly what's consuming your budget and where to optimize.

Subagents - Isolated Context

Subagents run in their own context window. Large research tasks, file analysis, or exploration work that would bloat your main context can be delegated to a subagent. Only the summary comes back to your session.

This is the official recommended pattern for keeping your primary context clean while still doing comprehensive work.

This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.

Smart Loading: On-Demand vs Always-On

The key principle: separate your instructions into what Claude needs every session vs what it needs sometimes.

What Should Always Load

  • Core project instructions (stack, conventions, banned patterns)
  • Build and test commands
  • Memory bootup (CLAUDE.md + auto memory)

What Should Load On Demand

  • Domain-specific rules (API design, security, testing)
  • Documentation references
  • Configuration guides
  • Feature-specific patterns

How to Implement

Step 1: Audit your current rules. Run /memory to see what's loaded. Count the files.

Step 2: Categorize each rule. Would you want this loaded when doing completely unrelated work? If no, it's a candidate for path-scoping.

Step 3: Add paths: frontmatter. Move domain-specific rules from always-on to on-demand by adding glob patterns:

code
---
paths:
  - "src/components/**/*.tsx"
---

# Frontend Component Rules
...

Step 4: Verify. Run /context before and after to measure the difference. Target under 5% context consumed at startup for most projects.

What Survives Compaction

This table is critical for designing a context-efficient system. When compaction runs, not everything persists:

Content TypeAfter Compaction
System promptUnchanged
Project-root CLAUDE.md + unscoped rulesRe-injected from disk
Auto memoryRe-injected from disk
Path-scoped rules (with paths: frontmatter)Lost until matching file is read
Nested CLAUDE.md in subdirectoriesLost until file in that directory is read
Skill bodiesRe-injected (capped at 5,000 tokens/skill, 25,000 total)
HooksNot in context - run as code

Design implication: Instructions that must survive compaction belong in project-root CLAUDE.md or unscoped rules. Instructions that are only needed for specific file types should use paths: frontmatter - they'll reload automatically when Claude touches those files again.

Advanced: Building a Context Router

For large systems with 30+ rules and multiple domains, the native path-scoping may not be granular enough. You want keyword-based routing: when the conversation mentions "delegation," load delegation rules. When it mentions "debug," load debugging patterns.

This is what I built for my own system. A context router that maps keywords to knowledge files:

code
{
  "routes": [
    {
      "keywords": ["delegate", "subagent", "agent"],
      "load": ["rules/delegation.md", "rules/personalities.md"]
    },
    {
      "keywords": ["debug", "error", "fix"],
      "load": ["rules/debugging.md"]
    }
  ]
}

A PreToolUse hook intercepts each message, matches keywords against routes, and injects relevant rules as context. Rules that don't match the current topic never load.

This is custom infrastructure beyond what Claude Code provides natively. The Evolving Lite plugin includes a basic context routing setup. For most projects, path-scoped rules cover 80% of the need without custom code.

Budget Awareness

On top of routing, I track context usage thresholds:

  • Under 60%: normal operation, load full rules on demand
  • 60-80%: load summaries instead of full docs, prefer delegation to subagents
  • Above 80%: stop loading new rules, prepare session handoff

Claude reasons about these thresholds naturally. When it sees "context at 73% with 5 tasks remaining," it decides to delegate rather than risk degradation. Not hardcoded logic - emergent behavior from clear constraints.

Two Ways to Reclaim Context Without Losing Knowledge

When context fills up, most developers reach for /clear. But /clear destroys the prompt cache - every cached token rebuilds from scratch. There are two smarter approaches that preserve your cache and your knowledge.

Strategy 1: Compact Stuffing + Rewind (Mid-Session Reset)

This strategy uses /compact-stuff, a custom command from Evolving Lite, to distill session knowledge before resetting context.

How it works:

  1. Run /compact-stuff - it analyzes your conversation, extracts decisions, solutions, task state, and key file paths, then copies the distilled summary to your clipboard
  2. Run /rewind and select Delete (not Summarize - you already have the better summary in your clipboard)
  3. Paste (Cmd+V) the compact-stuff output as your first message

Why this works: /rewind with Delete clears the conversation history but keeps the system prompt, CLAUDE.md, tools, and MCP servers intact. The prompt cache prefix survives because the static content hasn't changed. You start fresh with minimal context usage but retain the most valuable knowledge from the session.

When to use: Mid-session when context is 70%+ but you have more work to do in the same project. You stay in the same session - no restart needed.

Strategy 2: Handoff + Compact (Session Transition)

This strategy uses /whats-next, another custom command from Evolving Lite, to create a handoff file before running compaction.

How it works:

  1. Run /whats-next - it creates a structured handoff file with what was accomplished, what was learned, and specific next steps
  2. Run /compact - Claude Code summarizes the conversation history, re-injects CLAUDE.md and unscoped rules from disk, and preserves the cached prefix
  3. Say "continue" - Claude reads the handoff file and picks up where you left off

Why this works: /compact preserves the prompt cache because the static prefix (system prompt, tools, CLAUDE.md) stays identical. The conversation history gets summarized, freeing context space. The handoff file provides structured continuity that the compaction summary alone can't - specific task state, decisions with reasoning, and actionable next steps.

When to use: When you've completed a logical chunk of work and want to transition to the next phase. Also effective at the end of a work session - the handoff file persists so the next session (even days later) starts with full context.

Why Both Strategies Preserve the Cache

Both strategies work because they keep the static prefix untouched:

Content/compact-stuff + RewindHandoff + /compact
System promptPreservedPreserved
CLAUDE.md + rulesPreservedRe-injected from disk
MCP tool stubsPreservedPreserved
Prompt cacheIntactIntact
Conversation historyDeleted (replaced by paste)Summarized
Session knowledgeIn clipboard pasteIn handoff file

The key insight: the expensive part of a session isn't the conversation - it's the cached prefix. Both strategies protect the prefix while reclaiming the conversation space.

Before and After

Before: Load Everything
  • -15%+ context consumed at startup
  • -3-4 tasks before degradation
  • -Frequent compaction losing path-scoped rules
  • -All rules loaded regardless of task
After: Smart Loading
  • +3-5% context at startup
  • +10+ tasks with consistent quality
  • +Compaction only at 80%+
  • +Rules load only when relevant

The difference is not just tokens saved. It's session consistency. When you start at 3% instead of 15%, you have headroom for complex tasks. When rules load on demand, you never pay for context you don't use.

Practical Steps to Reduce Context Waste

Start with the highest-impact changes:

  1. Run /context right now. See your actual baseline. If startup is under 5%, you're fine. If it's over 10%, you have optimization opportunities.

  2. Add paths: frontmatter to domain-specific rules. The single biggest win for most setups. Takes 5 minutes per rule file.

  3. Keep CLAUDE.md under 200 lines. Anthropic recommends this in their official docs. Use @imports or .claude/rules/ for longer content.

  4. Use subagents for research. Instead of reading 10 large files into your context, delegate the research to a subagent. Only the summary enters your context.

  5. Prefer /compact over /clear. Compaction preserves your prompt cache and project-root instructions. /clear destroys everything.

  6. Monitor with /context regularly. Make it a habit to check context usage mid-session, especially before complex tasks.

FAQ

How big is Claude Code's context window?+
Claude Code supports up to 1M tokens on Opus 4.6 and Sonnet 4.6. However, context efficiency still matters because cost scales with token usage, quality can degrade as context fills, and session length is limited by how fast you consume the budget.
What is compaction in Claude Code?+
Compaction summarizes your conversation history to free up context space while preserving key content. Project-root CLAUDE.md, unscoped rules, and auto memory survive compaction and are re-injected from disk. Path-scoped rules and nested CLAUDE.md files are lost until re-triggered.
Should I use /clear or /compact in Claude Code?+
Use /compact to free context while keeping your session alive - it preserves the prompt cache and project instructions. Use /clear only between completely unrelated tasks where you want a fresh start. /clear destroys the prompt cache and forces a full rebuild.
What are path-scoped rules in Claude Code?+
Path-scoped rules use paths: frontmatter with glob patterns to only load when Claude reads matching files. This means API rules only load when working on API files, frontend rules only load when working on components. Rules without paths: frontmatter load at startup for every session.
How do I check my Claude Code context usage?+
Run /context for a live breakdown of context usage by category with optimization suggestions. Run /memory to see which CLAUDE.md and auto memory files loaded at startup. These commands show exactly what's consuming your context budget.
How do subagents help with context management?+
Subagents run in their own separate context window. When you delegate research or file analysis to a subagent, the work happens in isolated context. Only the summary returns to your main session, keeping your primary context clean for the tasks you're actively working on.
What is a context router in Claude Code?+
A context router is a custom system that maps keywords to knowledge files, loading rules on demand based on what you're discussing rather than loading everything at startup. This is not a native Claude Code feature but can be built with hooks. Path-scoped rules provide similar on-demand loading natively.
Does context management affect prompt caching?+
Yes. Stable context at the front of the prompt (system prompt, CLAUDE.md, tool stubs) gets cached and reused across requests at 90% discount. Dynamic content (conversation, on-demand rules) sits after the cached prefix. Keeping your static prefix stable maximizes cache hit rates and reduces cost.

>_ Get the free Claude Code guide

>_ No spam. Unsubscribe anytime.

>_ Related