Claude Code Context Window: Stop Wasting Tokens [2026]

Claude Code Context Window: What You're Actually Paying For

Claude Code's context window holds everything Claude knows about your session - your instructions, the files it reads, its own responses, and content that never appears in your terminal. Most developers don't realize how much loads before they type a single word.

Understanding what fills the context window and how to control it is the difference between sessions that stay sharp for hours and sessions that degrade after three tasks.

What Loads Before You Type Anything

This is what enters context at session start - before your first message:

System prompt - Claude Code's core instructions
CLAUDE.md files - your project instructions, loaded from the directory hierarchy
Auto memory - first 200 lines of MEMORY.md (or 25KB, whichever comes first)
MCP tool names - lightweight stubs for all configured MCP servers
Skill descriptions - metadata for available skills

How This Adds Up

Each layer consumes tokens from your context budget:

Component	Typical Size	You Control It?
System prompt	Fixed	No
CLAUDE.md (project root)	100-500 lines	Yes - keep under 200 lines
Rules files (unscoped)	Per file	Yes - scope with `paths:`
Auto memory	Up to 200 lines	Partially - Claude writes it
MCP tool stubs	~50-100 tokens per server	Yes - manage server count
Skill descriptions	~50 tokens each	Yes - manage skill count

A minimal setup might consume 2,000-3,000 tokens at startup. A setup with 40+ rules files, 7 MCP servers, and 30 skills could easily consume 15,000+ tokens before you say hello.

The Claude Code Context Budget Problem

Context windows are large - up to 1M tokens on Opus and Sonnet 4.6. But that doesn't mean context is free.

Why Context Efficiency Still Matters

Even with 1M tokens, three factors make context management critical:

Cost. Cached tokens are cheap (90% discount via prompt caching), but uncached tokens are full price. Every token in the dynamic portion of your context costs real money per request. A session that loads 30K tokens of rules vs 3K tokens of rules pays 10x more per message for the rules portion.

Quality. Claude's attention is not uniform across the context window. Anthropic's best practices acknowledge that performance can degrade as context fills. More relevant context beats more total context.

Session length. The more you front-load, the fewer tasks you can complete before hitting limits. If you start at 15% context consumed, you have less runway than starting at 3%.

The Real Problem: Always-On Loading

The biggest source of context waste is loading everything at session start regardless of what you're doing. If you have 40 rules files and all are unscoped, they all load at startup. When you're debugging a CSS issue, you're paying for your API design guidelines, your security policy, and your deployment checklist to sit in context doing nothing.

Claude Code Context Management: Native Tools

Claude Code provides several built-in mechanisms for context management. Use these before building custom solutions.

Path-Scoped Rules

The most impactful native feature for context efficiency. Path-scoped rules only load when Claude reads files matching a glob pattern:

code

---
paths:
  - "src/api/**/*.ts"
---

# API Development Rules

- All endpoints must include input validation
- Use standard error response format

This rule only enters context when Claude reads TypeScript files under src/api/. Working on frontend? This rule never loads. Zero cost.

Convert your "sometimes needed" rules from unscoped (always loaded) to path-scoped (loaded on demand). This single change can cut startup context by 50%+ for projects with many rules.

/compact - Summarize Without Losing Instructions

When context fills up mid-session, compaction summarizes conversation history while preserving key content. Use /compact to trigger it manually, or let Claude Code auto-compact when approaching limits.

Critical to understand: compaction summarizes your conversation, not your instructions. Project-root CLAUDE.md and unscoped rules survive compaction and are re-injected from disk. Path-scoped rules and nested CLAUDE.md files are lost until their trigger files are read again.

/clear - Full Reset

Use /clear between unrelated tasks to start fresh. This destroys the entire context including the prompt cache - use it deliberately, not reflexively.

/context - See Your Usage

Run /context for a live breakdown of context usage by category. This shows you exactly what's consuming your budget and where to optimize.

Subagents - Isolated Context

Subagents run in their own context window. Large research tasks, file analysis, or exploration work that would bloat your main context can be delegated to a subagent. Only the summary comes back to your session.

This is the official recommended pattern for keeping your primary context clean while still doing comprehensive work.

This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.

Smart Loading: On-Demand vs Always-On

The key principle: separate your instructions into what Claude needs every session vs what it needs sometimes.

What Should Always Load

Core project instructions (stack, conventions, banned patterns)
Build and test commands
Memory bootup (CLAUDE.md + auto memory)

What Should Load On Demand

Domain-specific rules (API design, security, testing)
Documentation references
Configuration guides
Feature-specific patterns

How to Implement

Step 1: Audit your current rules. Run /memory to see what's loaded. Count the files.

Step 2: Categorize each rule. Would you want this loaded when doing completely unrelated work? If no, it's a candidate for path-scoping.

Step 3: Add paths: frontmatter. Move domain-specific rules from always-on to on-demand by adding glob patterns:

code

---
paths:
  - "src/components/**/*.tsx"
---

# Frontend Component Rules
...

Step 4: Verify. Run /context before and after to measure the difference. Target under 5% context consumed at startup for most projects.

What Survives Compaction

This table is critical for designing a context-efficient system. When compaction runs, not everything persists:

Content Type	After Compaction
System prompt	Unchanged
Project-root CLAUDE.md + unscoped rules	Re-injected from disk
Auto memory	Re-injected from disk
Path-scoped rules (with `paths:` frontmatter)	Lost until matching file is read
Nested CLAUDE.md in subdirectories	Lost until file in that directory is read
Skill bodies	Re-injected (capped at 5,000 tokens/skill, 25,000 total)
Hooks	Not in context - run as code

Design implication: Instructions that must survive compaction belong in project-root CLAUDE.md or unscoped rules. Instructions that are only needed for specific file types should use paths: frontmatter - they'll reload automatically when Claude touches those files again.

Advanced: Building a Context Router

For large systems with 30+ rules and multiple domains, the native path-scoping may not be granular enough. You want keyword-based routing: when the conversation mentions "delegation," load delegation rules. When it mentions "debug," load debugging patterns.

This is what I built for my own system. A context router that maps keywords to knowledge files:

code

{
  "routes": [
    {
      "keywords": ["delegate", "subagent", "agent"],
      "load": ["rules/delegation.md", "rules/personalities.md"]
    },
    {
      "keywords": ["debug", "error", "fix"],
      "load": ["rules/debugging.md"]
    }
  ]
}

A PreToolUse hook intercepts each message, matches keywords against routes, and injects relevant rules as context. Rules that don't match the current topic never load.

This is custom infrastructure beyond what Claude Code provides natively. The Evolving Lite plugin includes a basic context routing setup. For most projects, path-scoped rules cover 80% of the need without custom code.

Budget Awareness

On top of routing, I track context usage thresholds:

Under 60%: normal operation, load full rules on demand
60-80%: load summaries instead of full docs, prefer delegation to subagents
Above 80%: stop loading new rules, prepare session handoff

Claude reasons about these thresholds naturally. When it sees "context at 73% with 5 tasks remaining," it decides to delegate rather than risk degradation. Not hardcoded logic - emergent behavior from clear constraints.

Two Ways to Reclaim Context Without Losing Knowledge

When context fills up, most developers reach for /clear. But /clear destroys the prompt cache - every cached token rebuilds from scratch. There are two smarter approaches that preserve your cache and your knowledge.

Strategy 1: Compact Stuffing + Rewind (Mid-Session Reset)

This strategy uses /compact-stuff, a custom command from Evolving Lite, to distill session knowledge before resetting context.

How it works:

Run /compact-stuff - it analyzes your conversation, extracts decisions, solutions, task state, and key file paths, then copies the distilled summary to your clipboard
Run /rewind and select Delete (not Summarize - you already have the better summary in your clipboard)
Paste (Cmd+V) the compact-stuff output as your first message

Why this works: /rewind with Delete clears the conversation history but keeps the system prompt, CLAUDE.md, tools, and MCP servers intact. The prompt cache prefix survives because the static content hasn't changed. You start fresh with minimal context usage but retain the most valuable knowledge from the session.

When to use: Mid-session when context is 70%+ but you have more work to do in the same project. You stay in the same session - no restart needed.

Strategy 2: Handoff + Compact (Session Transition)

This strategy uses /whats-next, another custom command from Evolving Lite, to create a handoff file before running compaction.

How it works:

Run /whats-next - it creates a structured handoff file with what was accomplished, what was learned, and specific next steps
Run /compact - Claude Code summarizes the conversation history, re-injects CLAUDE.md and unscoped rules from disk, and preserves the cached prefix
Say "continue" - Claude reads the handoff file and picks up where you left off

Why this works: /compact preserves the prompt cache because the static prefix (system prompt, tools, CLAUDE.md) stays identical. The conversation history gets summarized, freeing context space. The handoff file provides structured continuity that the compaction summary alone can't - specific task state, decisions with reasoning, and actionable next steps.

When to use: When you've completed a logical chunk of work and want to transition to the next phase. Also effective at the end of a work session - the handoff file persists so the next session (even days later) starts with full context.

Why Both Strategies Preserve the Cache

Both strategies work because they keep the static prefix untouched:

Content	`/compact-stuff` + Rewind	Handoff + `/compact`
System prompt	Preserved	Preserved
CLAUDE.md + rules	Preserved	Re-injected from disk
MCP tool stubs	Preserved	Preserved
Prompt cache	Intact	Intact
Conversation history	Deleted (replaced by paste)	Summarized
Session knowledge	In clipboard paste	In handoff file

The key insight: the expensive part of a session isn't the conversation - it's the cached prefix. Both strategies protect the prefix while reclaiming the conversation space.

Before and After

Before: Load Everything

-15%+ context consumed at startup
-3-4 tasks before degradation
-Frequent compaction losing path-scoped rules
-All rules loaded regardless of task

After: Smart Loading

+3-5% context at startup
+10+ tasks with consistent quality
+Compaction only at 80%+
+Rules load only when relevant

The difference is not just tokens saved. It's session consistency. When you start at 3% instead of 15%, you have headroom for complex tasks. When rules load on demand, you never pay for context you don't use.

Practical Steps to Reduce Context Waste

Start with the highest-impact changes:

Run /context right now. See your actual baseline. If startup is under 5%, you're fine. If it's over 10%, you have optimization opportunities.
Add paths: frontmatter to domain-specific rules. The single biggest win for most setups. Takes 5 minutes per rule file.
Keep CLAUDE.md under 200 lines. Anthropic recommends this in their official docs. Use @imports or .claude/rules/ for longer content.
Use subagents for research. Instead of reading 10 large files into your context, delegate the research to a subagent. Only the summary enters your context.
Prefer /compact over /clear. Compaction preserves your prompt cache and project-root instructions. /clear destroys everything.
Monitor with /context regularly. Make it a habit to check context usage mid-session, especially before complex tasks.

FAQ

How big is Claude Code's context window?+

Claude Code supports up to 1M tokens on Opus 4.6 and Sonnet 4.6. However, context efficiency still matters because cost scales with token usage, quality can degrade as context fills, and session length is limited by how fast you consume the budget.

What is compaction in Claude Code?+

Compaction summarizes your conversation history to free up context space while preserving key content. Project-root CLAUDE.md, unscoped rules, and auto memory survive compaction and are re-injected from disk. Path-scoped rules and nested CLAUDE.md files are lost until re-triggered.

Should I use /clear or /compact in Claude Code?+

Use /compact to free context while keeping your session alive - it preserves the prompt cache and project instructions. Use /clear only between completely unrelated tasks where you want a fresh start. /clear destroys the prompt cache and forces a full rebuild.

What are path-scoped rules in Claude Code?+

Path-scoped rules use paths: frontmatter with glob patterns to only load when Claude reads matching files. This means API rules only load when working on API files, frontend rules only load when working on components. Rules without paths: frontmatter load at startup for every session.

How do I check my Claude Code context usage?+

Run /context for a live breakdown of context usage by category with optimization suggestions. Run /memory to see which CLAUDE.md and auto memory files loaded at startup. These commands show exactly what's consuming your context budget.

How do subagents help with context management?+

Subagents run in their own separate context window. When you delegate research or file analysis to a subagent, the work happens in isolated context. Only the summary returns to your main session, keeping your primary context clean for the tasks you're actively working on.

What is a context router in Claude Code?+

A context router is a custom system that maps keywords to knowledge files, loading rules on demand based on what you're discussing rather than loading everything at startup. This is not a native Claude Code feature but can be built with hooks. Path-scoped rules provide similar on-demand loading natively.

Does context management affect prompt caching?+

Yes. Stable context at the front of the prompt (system prompt, CLAUDE.md, tool stubs) gets cached and reused across requests at 90% discount. Dynamic content (conversation, on-demand rules) sits after the cached prefix. Keeping your static prefix stable maximizes cache hit rates and reduces cost.

Claude Code Context Window: Stop Wasting Tokens [2026]

Claude Code Context Window: What You're Actually Paying For

What Loads Before You Type Anything

How This Adds Up

The Claude Code Context Budget Problem

Why Context Efficiency Still Matters

The Real Problem: Always-On Loading

Claude Code Context Management: Native Tools

Path-Scoped Rules

/compact - Summarize Without Losing Instructions

/clear - Full Reset

/context - See Your Usage

Subagents - Isolated Context

Smart Loading: On-Demand vs Always-On

What Should Always Load

What Should Load On Demand

How to Implement

What Survives Compaction

Advanced: Building a Context Router

Budget Awareness

Two Ways to Reclaim Context Without Losing Knowledge

Strategy 1: Compact Stuffing + Rewind (Mid-Session Reset)

Strategy 2: Handoff + Compact (Session Transition)

Why Both Strategies Preserve the Cache

Before and After

Practical Steps to Reduce Context Waste

FAQ

>_ Related

Claude Code Prompt Caching: The One Rule Behind It [2026]

Claude Code System Design: 5 Breakthroughs from Biology, Not CS

Claude Code Session Management: Why I Stopped Restarting