Claude Code Session Management: Why I Stopped Restarting

Listen to this article (5 min)

I Injected a Test Marker Into My Auto-Memory File

Mid-session. On purpose. I wanted to know if Claude Code re-reads MEMORY.md every turn or loads it once at session start.

Added a unique string. Checked the next turn. The marker was invisible. Claude Code had no idea I changed the file.

That one test changed how I think about session management. Because if the auto-memory file is a frozen snapshot, then every session restart forces a full prompt cache rebuild. And every cache rebuild costs real money.

The free 3-pattern guide covers memory, delegation, and context management fundamentals. This post goes deeper into the session lifecycle most developers get wrong.

The Restart Reflex

Here is what most developers do when Claude Code hits 80% context: they start a new session. Fresh context, clean slate, feels productive.

Here is what actually happens: the entire prompt cache gets destroyed. Every token in the prefix - system prompt, tool definitions, CLAUDE.md, project rules - reprocessed at full price. That prefix is typically 50,000+ tokens. With prompt caching, those cost 90% less on every request. Without cache, full price.

One restart. 50K tokens at full price instead of cached. About $0.67 for Opus. Do that 10 times per day and you are burning $6.70 in invisible costs. Not because you did something wrong - because you did not know the cache existed.

But the money is not even the real problem. The real problem is lost momentum. Every restart means re-reading files, re-establishing context, re-explaining what you were working on. The first 3-5 turns of a new session are always the worst.

Session Restart (Cache Destroyed)

-50K prefix tokens at full price
-3-5 turns to re-establish context
-MEMORY.md re-read (only benefit)
-Lost conversation nuance

Handoff + Compact (Cache Preserved)

+0 tokens at full price (prefix cached)
+Immediate continuation, no warmup
+Conversation compressed, not lost
+Momentum preserved

What Is Actually in the Cached Prefix

I mapped my own Claude Code session to understand what stays cached and what changes every turn. The results surprised me.

The prefix contains everything loaded at session start: the core system prompt, all tool definitions (including MCP tool stubs), your CLAUDE.md files, your .claude/rules/ files, and your auto-memory file. This is the cached part. It stays byte-for-byte identical across every turn within a session.

Everything else - conversation messages, tool results, hook outputs, skill metadata - gets injected via system-reminder tags into the conversation layer. This is the dynamic part. It changes every turn and is always recomputed. But critically, it does not touch the prefix. The cache stays intact.

That is why system-reminder tags exist. They are not a quirky implementation detail. They are the design pattern I call prompt infusion - Claude Code's mechanism for injecting dynamic information without touching the cached prefix. Your 60+ skills? Infused as conversation messages, not prefix entries. Hook outputs from your automation? Conversation messages. Git status snapshot? Conversation message. Even task tracking reminders get injected this way.

The principle is simple: the prefix is immutable after session start. Everything dynamic flows through the conversation layer via infusion. That is what makes the cache so stable under normal usage.

What Breaks the Cache (And What Does Not)

I tested and cataloged which actions are cache-safe and which destroy the prefix.

Cache Impact by Action

Effort/fast mode toggle - API parameter only

Sub-agent delegation - separate session, separate cache

File reads (on-demand rules) - conversation layer

MEMORY.md edits - snapshot, not re-read

Model switch (/opus, /sonnet) - different tokenizer

/clear command - entire prefix destroyed

The safe list is longer than I expected. Effort toggling, sub-agent delegation, file reads, even editing your auto-memory file mid-session. All cache-safe. The prefix never changes.

The dangerous list is short but severe. Switching models mid-session forces a complete cache rebuild because each model has its own cache keyed to its model ID. When you switch from Opus to Haiku, the Haiku model has no cached prefix - it rebuilds from scratch at full token prices. And /clear destroys the prefix entirely - by design.

The practical takeaway: pick your model at session start and stick with it. If you need a different model for a specific task, delegate it to a sub-agent. The sub-agent runs in its own session with its own cache. Your main session stays warm.

Context Drift: The Hidden Cost of Long Sessions

There is a subtlety the prefix cache does not solve. The conversation layer grows every turn. And because conversation tokens are always recomputed, your effective cache ratio drops over time.

Early in a session with a 50K prefix, 96% of your tokens are cached. By turn 30 with 80K of conversation, that drops to 38%. By turn 50, you are paying full price on the majority of your input tokens. The prefix is still cached, but it is a shrinking percentage of the total.

This is context drift. The session gets progressively more expensive per turn - not because anything broke, but because the dynamic layer keeps growing. Compaction is the answer. It compresses the conversation back to a small summary while the prefix stays untouched. Your cache ratio jumps back to 90%+ instantly.

The Loop Strategy

Once I understood that session restarts are expensive and compaction resets context drift, the strategy became obvious. Stop restarting. Instead, save and compress.

When context gets high, I create a handoff document - a structured summary of current state, progress, and next steps - saved to disk. Then I let compaction handle the context pressure. The conversation gets compressed but the prefix stays identical. Context drops from 90% back to 35%. Cache stays warm. I keep working.

When context rises again, I repeat. Another handoff, another compaction. The handoffs stack on disk as a trail of checkpoints. If I ever need context from three cycles ago, I can read the handoff file.

The only reasons to actually start a new session: critical auto-memory updates that need to take effect, noticeable context drift, or a complete topic change where fresh context beats a warm cache.

$0Cache rebuild per cycle

~35%Context after compact

3+Cycles per session

What This Changes

The biggest shift is psychological. Hitting 80% context used to feel like a countdown. Now it is just a checkpoint. Save, compress, continue.

The irony: I built a tiered context loading system to minimize token usage. Turns out it was already cache-optimal by accident. On-demand rules loaded via file reads go to the conversation layer, not the prefix. Token efficiency and cache efficiency aligned without planning it.

Once you understand the one rule behind prompt caching - prefix matching requires static content first, dynamic content last - every optimization you build tends to reinforce it. Session management is no exception.

This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.

FAQ

Does editing MEMORY.md mid-session break the prompt cache?+

No. MEMORY.md is loaded once at session start as a frozen snapshot. Edits during the session are saved to disk but not reflected in the current session's context. The cached prefix stays identical. Your edits take effect in the next session.

Is switching effort levels (fast mode) in Claude Code cache-safe?+

Yes. The effort toggle changes an API parameter, not the prompt prefix. The model stays the same, the tokenizer stays the same, and the cached prefix remains byte-for-byte identical. You can freely toggle between effort levels without cache impact.

How many times can you compact within one Claude Code session?+

There is no hard limit. Each compaction uses cache-safe forking - the prefix stays identical while the conversation gets compressed. You can loop through multiple handoff-and-compact cycles within a single session, keeping your cache warm indefinitely.

When should you actually start a new Claude Code session?+

Start a new session when your auto-memory updates are critical for the next task, when you notice context drift (asking questions you should already know), or when switching to a completely different topic. For everything else, the handoff-and-compact loop preserves your cache and momentum.

Claude Code Session Management: Why I Stopped Restarting

I Injected a Test Marker Into My Auto-Memory File

The Restart Reflex

What Is Actually in the Cached Prefix

What Breaks the Cache (And What Does Not)

Context Drift: The Hidden Cost of Long Sessions

The Loop Strategy

What This Changes

FAQ

>_ Related

Claude Code Prompt Caching: The One Rule Behind It [2026]

Claude Code System Design: 5 Breakthroughs from Biology, Not CS

Claude Code Session Memory: Build Systems, Not Sessions