Claude Code Context Window: What You're Actually Paying For
Claude Code's context window holds everything Claude knows about your session - your instructions, the files it reads, its own responses, and content that never appears in your terminal. Most developers don't realize how much loads before they type a single word.
Understanding what fills the context window and how to control it is the difference between sessions that stay sharp for hours and sessions that degrade after three tasks.
What Loads Before You Type Anything
This is what enters context at session start - before your first message:
- System prompt - Claude Code's core instructions
- CLAUDE.md files - your project instructions, loaded from the directory hierarchy
- Auto memory - first 200 lines of
MEMORY.md(or 25KB, whichever comes first) - MCP tool names - lightweight stubs for all configured MCP servers
- Skill descriptions - metadata for available skills
How This Adds Up
Each layer consumes tokens from your context budget:
| Component | Typical Size | You Control It? |
|---|---|---|
| System prompt | Fixed | No |
| CLAUDE.md (project root) | 100-500 lines | Yes - keep under 200 lines |
| Rules files (unscoped) | Per file | Yes - scope with paths: |
| Auto memory | Up to 200 lines | Partially - Claude writes it |
| MCP tool stubs | ~50-100 tokens per server | Yes - manage server count |
| Skill descriptions | ~50 tokens each | Yes - manage skill count |
A minimal setup might consume 2,000-3,000 tokens at startup. A setup with 40+ rules files, 7 MCP servers, and 30 skills could easily consume 15,000+ tokens before you say hello.
The Claude Code Context Budget Problem
Context windows are large - up to 1M tokens on Opus and Sonnet 4.6. But that doesn't mean context is free.
Why Context Efficiency Still Matters
Even with 1M tokens, three factors make context management critical:
Cost. Cached tokens are cheap (90% discount via prompt caching), but uncached tokens are full price. Every token in the dynamic portion of your context costs real money per request. A session that loads 30K tokens of rules vs 3K tokens of rules pays 10x more per message for the rules portion.
Quality. Claude's attention is not uniform across the context window. Anthropic's best practices acknowledge that performance can degrade as context fills. More relevant context beats more total context.
Session length. The more you front-load, the fewer tasks you can complete before hitting limits. If you start at 15% context consumed, you have less runway than starting at 3%.
The Real Problem: Always-On Loading
The biggest source of context waste is loading everything at session start regardless of what you're doing. If you have 40 rules files and all are unscoped, they all load at startup. When you're debugging a CSS issue, you're paying for your API design guidelines, your security policy, and your deployment checklist to sit in context doing nothing.
Claude Code Context Management: Native Tools
Claude Code provides several built-in mechanisms for context management. Use these before building custom solutions.
Path-Scoped Rules
The most impactful native feature for context efficiency. Path-scoped rules only load when Claude reads files matching a glob pattern:
---
paths:
- "src/api/**/*.ts"
---
# API Development Rules
- All endpoints must include input validation
- Use standard error response format
This rule only enters context when Claude reads TypeScript files under src/api/. Working on frontend? This rule never loads. Zero cost.
Convert your "sometimes needed" rules from unscoped (always loaded) to path-scoped (loaded on demand). This single change can cut startup context by 50%+ for projects with many rules.
/compact - Summarize Without Losing Instructions
When context fills up mid-session, compaction summarizes conversation history while preserving key content. Use /compact to trigger it manually, or let Claude Code auto-compact when approaching limits.
Critical to understand: compaction summarizes your conversation, not your instructions. Project-root CLAUDE.md and unscoped rules survive compaction and are re-injected from disk. Path-scoped rules and nested CLAUDE.md files are lost until their trigger files are read again.
/clear - Full Reset
Use /clear between unrelated tasks to start fresh. This destroys the entire context including the prompt cache - use it deliberately, not reflexively.
/context - See Your Usage
Run /context for a live breakdown of context usage by category. This shows you exactly what's consuming your budget and where to optimize.
Subagents - Isolated Context
Subagents run in their own context window. Large research tasks, file analysis, or exploration work that would bloat your main context can be delegated to a subagent. Only the summary comes back to your session.
This is the official recommended pattern for keeping your primary context clean while still doing comprehensive work.
This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.
Smart Loading: On-Demand vs Always-On
The key principle: separate your instructions into what Claude needs every session vs what it needs sometimes.
What Should Always Load
- Core project instructions (stack, conventions, banned patterns)
- Build and test commands
- Memory bootup (CLAUDE.md + auto memory)
What Should Load On Demand
- Domain-specific rules (API design, security, testing)
- Documentation references
- Configuration guides
- Feature-specific patterns
How to Implement
Step 1: Audit your current rules. Run /memory to see what's loaded. Count the files.
Step 2: Categorize each rule. Would you want this loaded when doing completely unrelated work? If no, it's a candidate for path-scoping.
Step 3: Add paths: frontmatter. Move domain-specific rules from always-on to on-demand by adding glob patterns:
---
paths:
- "src/components/**/*.tsx"
---
# Frontend Component Rules
...
Step 4: Verify. Run /context before and after to measure the difference. Target under 5% context consumed at startup for most projects.
What Survives Compaction
This table is critical for designing a context-efficient system. When compaction runs, not everything persists:
| Content Type | After Compaction |
|---|---|
| System prompt | Unchanged |
| Project-root CLAUDE.md + unscoped rules | Re-injected from disk |
| Auto memory | Re-injected from disk |
Path-scoped rules (with paths: frontmatter) | Lost until matching file is read |
| Nested CLAUDE.md in subdirectories | Lost until file in that directory is read |
| Skill bodies | Re-injected (capped at 5,000 tokens/skill, 25,000 total) |
| Hooks | Not in context - run as code |
Design implication: Instructions that must survive compaction belong in project-root CLAUDE.md or unscoped rules. Instructions that are only needed for specific file types should use paths: frontmatter - they'll reload automatically when Claude touches those files again.
Advanced: Building a Context Router
For large systems with 30+ rules and multiple domains, the native path-scoping may not be granular enough. You want keyword-based routing: when the conversation mentions "delegation," load delegation rules. When it mentions "debug," load debugging patterns.
This is what I built for my own system. A context router that maps keywords to knowledge files:
{
"routes": [
{
"keywords": ["delegate", "subagent", "agent"],
"load": ["rules/delegation.md", "rules/personalities.md"]
},
{
"keywords": ["debug", "error", "fix"],
"load": ["rules/debugging.md"]
}
]
}
A PreToolUse hook intercepts each message, matches keywords against routes, and injects relevant rules as context. Rules that don't match the current topic never load.
This is custom infrastructure beyond what Claude Code provides natively. The Evolving Lite plugin includes a basic context routing setup. For most projects, path-scoped rules cover 80% of the need without custom code.
Budget Awareness
On top of routing, I track context usage thresholds:
- Under 60%: normal operation, load full rules on demand
- 60-80%: load summaries instead of full docs, prefer delegation to subagents
- Above 80%: stop loading new rules, prepare session handoff
Claude reasons about these thresholds naturally. When it sees "context at 73% with 5 tasks remaining," it decides to delegate rather than risk degradation. Not hardcoded logic - emergent behavior from clear constraints.
Two Ways to Reclaim Context Without Losing Knowledge
When context fills up, most developers reach for /clear. But /clear destroys the prompt cache - every cached token rebuilds from scratch. There are two smarter approaches that preserve your cache and your knowledge.
Strategy 1: Compact Stuffing + Rewind (Mid-Session Reset)
This strategy uses /compact-stuff, a custom command from Evolving Lite, to distill session knowledge before resetting context.
How it works:
- Run
/compact-stuff- it analyzes your conversation, extracts decisions, solutions, task state, and key file paths, then copies the distilled summary to your clipboard - Run
/rewindand select Delete (not Summarize - you already have the better summary in your clipboard) - Paste (Cmd+V) the compact-stuff output as your first message
Why this works: /rewind with Delete clears the conversation history but keeps the system prompt, CLAUDE.md, tools, and MCP servers intact. The prompt cache prefix survives because the static content hasn't changed. You start fresh with minimal context usage but retain the most valuable knowledge from the session.
When to use: Mid-session when context is 70%+ but you have more work to do in the same project. You stay in the same session - no restart needed.
Strategy 2: Handoff + Compact (Session Transition)
This strategy uses /whats-next, another custom command from Evolving Lite, to create a handoff file before running compaction.
How it works:
- Run
/whats-next- it creates a structured handoff file with what was accomplished, what was learned, and specific next steps - Run
/compact- Claude Code summarizes the conversation history, re-injects CLAUDE.md and unscoped rules from disk, and preserves the cached prefix - Say "continue" - Claude reads the handoff file and picks up where you left off
Why this works: /compact preserves the prompt cache because the static prefix (system prompt, tools, CLAUDE.md) stays identical. The conversation history gets summarized, freeing context space. The handoff file provides structured continuity that the compaction summary alone can't - specific task state, decisions with reasoning, and actionable next steps.
When to use: When you've completed a logical chunk of work and want to transition to the next phase. Also effective at the end of a work session - the handoff file persists so the next session (even days later) starts with full context.
Why Both Strategies Preserve the Cache
Both strategies work because they keep the static prefix untouched:
| Content | /compact-stuff + Rewind | Handoff + /compact |
|---|---|---|
| System prompt | Preserved | Preserved |
| CLAUDE.md + rules | Preserved | Re-injected from disk |
| MCP tool stubs | Preserved | Preserved |
| Prompt cache | Intact | Intact |
| Conversation history | Deleted (replaced by paste) | Summarized |
| Session knowledge | In clipboard paste | In handoff file |
The key insight: the expensive part of a session isn't the conversation - it's the cached prefix. Both strategies protect the prefix while reclaiming the conversation space.
Before and After
- -15%+ context consumed at startup
- -3-4 tasks before degradation
- -Frequent compaction losing path-scoped rules
- -All rules loaded regardless of task
- +3-5% context at startup
- +10+ tasks with consistent quality
- +Compaction only at 80%+
- +Rules load only when relevant
The difference is not just tokens saved. It's session consistency. When you start at 3% instead of 15%, you have headroom for complex tasks. When rules load on demand, you never pay for context you don't use.
Practical Steps to Reduce Context Waste
Start with the highest-impact changes:
-
Run
/contextright now. See your actual baseline. If startup is under 5%, you're fine. If it's over 10%, you have optimization opportunities. -
Add
paths:frontmatter to domain-specific rules. The single biggest win for most setups. Takes 5 minutes per rule file. -
Keep CLAUDE.md under 200 lines. Anthropic recommends this in their official docs. Use
@importsor.claude/rules/for longer content. -
Use subagents for research. Instead of reading 10 large files into your context, delegate the research to a subagent. Only the summary enters your context.
-
Prefer
/compactover/clear. Compaction preserves your prompt cache and project-root instructions./cleardestroys everything. -
Monitor with
/contextregularly. Make it a habit to check context usage mid-session, especially before complex tasks.
![Claude Code Context Window: Stop Wasting Tokens [2026]](/_next/image?url=%2Fblog%2Fcontext-management-hero.webp&w=3840&q=75)
![Claude Code Prompt Caching: The One Rule Behind It [2026]](/_next/image?url=%2Fblog%2Fprompt-caching-hero.webp&w=3840&q=75)

