I Injected a Test Marker Into My Auto-Memory File
Mid-session. On purpose. I wanted to know if Claude Code re-reads MEMORY.md every turn or loads it once at session start.
Added a unique string. Checked the next turn. The marker was invisible. Claude Code had no idea I changed the file.
That one test changed how I think about session management. Because if the auto-memory file is a frozen snapshot, then every session restart forces a full prompt cache rebuild. And every cache rebuild costs real money.
The free 3-pattern guide covers memory, delegation, and context management fundamentals. This post goes deeper into the session lifecycle most developers get wrong.
The Restart Reflex
Here is what most developers do when Claude Code hits 80% context: they start a new session. Fresh context, clean slate, feels productive.
Here is what actually happens: the entire prompt cache gets destroyed. Every token in the prefix - system prompt, tool definitions, CLAUDE.md, project rules - reprocessed at full price. That prefix is typically 50,000+ tokens. With prompt caching, those cost 90% less on every request. Without cache, full price.
One restart. 50K tokens at full price instead of cached. About $0.67 for Opus. Do that 10 times per day and you are burning $6.70 in invisible costs. Not because you did something wrong - because you did not know the cache existed.
But the money is not even the real problem. The real problem is lost momentum. Every restart means re-reading files, re-establishing context, re-explaining what you were working on. The first 3-5 turns of a new session are always the worst.
- -50K prefix tokens at full price
- -3-5 turns to re-establish context
- -MEMORY.md re-read (only benefit)
- -Lost conversation nuance
- +0 tokens at full price (prefix cached)
- +Immediate continuation, no warmup
- +Conversation compressed, not lost
- +Momentum preserved
What Is Actually in the Cached Prefix
I mapped my own Claude Code session to understand what stays cached and what changes every turn. The results surprised me.
The prefix contains everything loaded at session start: the core system prompt, all tool definitions (including MCP tool stubs), your CLAUDE.md files, your .claude/rules/ files, and your auto-memory file. This is the cached part. It stays byte-for-byte identical across every turn within a session.
Everything else - conversation messages, tool results, hook outputs, skill metadata - gets injected via system-reminder tags into the conversation layer. This is the dynamic part. It changes every turn and is always recomputed. But critically, it does not touch the prefix. The cache stays intact.
That is why system-reminder tags exist. They are not a quirky implementation detail. They are the design pattern I call prompt infusion - Claude Code's mechanism for injecting dynamic information without touching the cached prefix. Your 60+ skills? Infused as conversation messages, not prefix entries. Hook outputs from your automation? Conversation messages. Git status snapshot? Conversation message. Even task tracking reminders get injected this way.
The principle is simple: the prefix is immutable after session start. Everything dynamic flows through the conversation layer via infusion. That is what makes the cache so stable under normal usage.
What Breaks the Cache (And What Does Not)
I tested and cataloged which actions are cache-safe and which destroy the prefix.
The safe list is longer than I expected. Effort toggling, sub-agent delegation, file reads, even editing your auto-memory file mid-session. All cache-safe. The prefix never changes.
The dangerous list is short but severe. Switching models mid-session forces a complete cache rebuild because different models use different tokenizers. The same text produces different token sequences, so the cached prefix becomes useless. And /clear destroys the prefix entirely - by design.
The practical takeaway: pick your model at session start and stick with it. If you need a different model for a specific task, delegate it to a sub-agent. The sub-agent runs in its own session with its own cache. Your main session stays warm.
Context Drift: The Hidden Cost of Long Sessions
There is a subtlety the prefix cache does not solve. The conversation layer grows every turn. And because conversation tokens are always recomputed, your effective cache ratio drops over time.
Early in a session with a 50K prefix, 96% of your tokens are cached. By turn 30 with 80K of conversation, that drops to 38%. By turn 50, you are paying full price on the majority of your input tokens. The prefix is still cached, but it is a shrinking percentage of the total.
This is context drift. The session gets progressively more expensive per turn - not because anything broke, but because the dynamic layer keeps growing. Compaction is the answer. It compresses the conversation back to a small summary while the prefix stays untouched. Your cache ratio jumps back to 90%+ instantly.
The Loop Strategy
Once I understood that session restarts are expensive and compaction resets context drift, the strategy became obvious. Stop restarting. Instead, save and compress.
When context gets high, I create a handoff document - a structured summary of current state, progress, and next steps - saved to disk. Then I let compaction handle the context pressure. The conversation gets compressed but the prefix stays identical. Context drops from 90% back to 35%. Cache stays warm. I keep working.
When context rises again, I repeat. Another handoff, another compaction. The handoffs stack on disk as a trail of checkpoints. If I ever need context from three cycles ago, I can read the handoff file.
The only reasons to actually start a new session: critical auto-memory updates that need to take effect, noticeable context drift, or a complete topic change where fresh context beats a warm cache.
What This Changes
The biggest shift is psychological. Hitting 80% context used to feel like a countdown. Now it is just a checkpoint. Save, compress, continue.
The irony: I built a tiered context loading system to minimize token usage. Turns out it was already cache-optimal by accident. On-demand rules loaded via file reads go to the conversation layer, not the prefix. Token efficiency and cache efficiency aligned without planning it.
Once you understand the one rule behind prompt caching - prefix matching requires static content first, dynamic content last - every optimization you build tends to reinforce it. Session management is no exception.
Want the full system blueprint? Get the free 3-pattern guide.



