Why Claude Code Fails in Predictable Ways
Claude Code failure modes are not random. The same few problems show up in almost every long session: context gets compacted and the agent forgets what it was doing, it spawns a subagent for a task it should have done inline, or it follows stale notes from a project you finished weeks ago. Once you have seen them a few times, they become predictable - which means they become fixable.
I spent six months running an increasingly large Claude Code setup, and every painful day traced back to one of a handful of root causes. Instead of babysitting each session, I turned each failure mode into a hook or a rule that catches it automatically. That collection became Evolving Lite, an open-source Claude Code plugin with 10 hooks, 5 rules, and a context router.
This post walks through six concrete Claude Code failure modes and the exact mechanism that fixes each. Every file path, threshold, and code block below comes straight from the plugin, so you can clone it and read the real implementation.
TL;DR: Six Claude Code failure modes cause most long-session pain: compaction drift, over-delegation, stale memory, hook noise, context poisoning, and prompt bloat. Each has a deterministic fix - a PreCompact extractor, a delegation score, an auto-archiver, debounce gates, a prompt-injection blocker, and a keyword-scoped context router.
A failure mode is just a moment where the agent loses information or wastes it. The fix is always the same shape: detect the moment with a hook, then either rescue the information or block the waste before it happens.
Failure Mode 1: Compaction Drift Wipes Context
Compaction drift happens when Claude Code compresses a long conversation to free up context, and the summary drops the decisions, file paths, and open loops you actually needed. The agent keeps working but on a thinner picture of the task, so it repeats steps or contradicts earlier choices.
The fix has two layers. First, a PreToolUse hook watches the context budget and warns before you ever hit the wall. In Evolving Lite this lives in hooks/scripts/context-warning.sh with three thresholds:
THRESH_WARN=70
THRESH_AUTO=93
THRESH_RESET=50
DEBOUNCE_SECONDS=120
At 70% it warns. At 93% it injects an auto-handoff instruction so Claude writes a handoff document before compaction wipes the working context:
echo "{\"systemMessage\": \"AUTO-HANDOFF: Context at ${pct}%. Create a handoff document with /whats-next, then continue working. After compaction, context resets and you can keep going.\", \"continue\": true}"
The second layer is a PreCompact hook, hooks/scripts/precompact-extract.py, that scans the transcript for decision, solution, and pattern markers and saves up to 5 of them as experience files before compaction runs. The knowledge survives the wipe and gets recalled later. This is the same lifecycle I cover in depth in Claude Code hooks and context management for Claude Code.
- -Compaction summarizes the chat
- -Decisions and file paths get dropped
- -Agent repeats or contradicts earlier work
- -You re-explain context manually
- +Transcript scanned before compaction
- +Up to 5 decisions saved as experiences
- +Knowledge recalled in later turns
- +Session continues on solid ground
Failure Mode 2: Over-Delegation Burns Tokens
Over-delegation is when Claude Code spawns a subagent for work it should have done inline. Subagents are powerful, but each one is a fresh context window with setup cost, and a session that delegates everything is slower and more expensive than one that delegates selectively.
The fix is a scoring gate, not a vibe. A UserPromptSubmit hook, hooks/scripts/delegation-enforcer.py, scores every prompt against the table in knowledge/rules/delegation.md and only suggests a subagent when the score clears a threshold of 3:
Delegate automatically when delegation score >= 3.
| Factor | Points |
|--------|--------|
| Scope > 2 files | +2 |
| Bulk operation | +2 |
| Research/learn | +2 |
| Code review | +2 |
| Exploration/search | +3 |
| Independent task | +2 |
| Critical keywords (production, deploy, password, secret) | -10 |
| User wants to see ("show me", "explain") | -5 |
The negative weights matter as much as the positive ones. A prompt that mentions production or secrets drops 10 points, so it never auto-delegates. A "show me" or "explain" request drops 5, because you want to watch that work, not hand it off. The model table caps complexity 7+ as "Don't delegate" - hard reasoning stays with the main agent.
I broke down the full scoring model and how it routes to Haiku or Sonnet in score-based auto-delegation for Claude Code. The point here is narrow: a number, not a guess, decides when a subagent is worth it.
This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.
Failure Mode 3: Stale Project Memory
Stale project memory is when Claude Code reads old notes, summaries, or experience files from work you have already finished, and treats them as current. The agent confidently heads down a path that was correct three weeks ago and wrong today. Memory that never expires is memory that lies.
Evolving Lite handles this with an auto-archiver, hooks/scripts/auto-archival.py, that runs on Stop (Tier 3, once per 24 hours) and retires experiences that are both old and unused:
EXP_MAX_AGE_DAYS = 90
SESSION_MAX_AGE_DAYS = 30
MIN_ACCESS_FOR_KEEP = 2
MIN_RELEVANCE_FOR_KEEP = 30
An experience older than 90 days with an access count under 2 and effective relevance under 30 gets archived, not deleted. Sessions older than 30 days are archived too. What stays is what you actually keep using.
The other half is re-orientation at the start of every session. The rule knowledge/rules/domain-memory-bootup.md tells Claude to read the active project file first and pick exactly one task:
### 1. Read Memory
1. ${CLAUDE_PLUGIN_ROOT}/_memory/index.json -> Active project, last session
2. ${CLAUDE_PLUGIN_ROOT}/_memory/projects/{active}.json -> Goals, Progress, Next Step
### 4. Pick One Task
Choose ONE task. Not multiple. Atomic progress.
Auto-archival moves stale files out of the recall path instead of deleting them. You keep an audit trail, but the agent only sees memory that is still earning its place. Full memory architecture in the Claude Code memory deep dive.
For semantic recall instead of keyword matching, the plugin's README notes the difference plainly: keyword-based memory matching lands around 60% recall, while a graph layer like Kairn pushes it toward 90%. I cover the persistent layer in the Claude Code memory system post.
Failure Mode 4: Hook Noise and Tool Spam
Hook noise is the opposite failure of having no hooks: every hook fires on every event, floods the context with status messages, and the signal drowns. Tool spam is the same problem from the model side - repeated, redundant tool calls that add tokens without adding information.
The fix is restraint built into the hook layer. Evolving Lite gates 7 of its 10 hooks behind session tiers, so a brand-new session (Tier 1) only runs the 4 essential hooks. Memory recall and archival do not fire until later sessions, when there is actually something to recall.
On top of tiering, individual hooks debounce themselves. The context warner uses DEBOUNCE_SECONDS=120 so it cannot warn twice in two minutes, and the session summarizer (hooks/scripts/session-summary.sh) has a 30-minute anti-spam gate so a stop-start workflow does not generate a pile of near-identical summaries. Every hook also carries a 10-second timeout, so a slow script can never stall your session.
The lesson generalizes beyond this plugin: a hook that fires too often is worse than no hook, because it trains you to ignore it. Design for the quietest useful signal.
Failure Mode 5: Context Poisoning
Context poisoning is when untrusted text - a file you read, a web page, a tool result - contains instructions aimed at the agent, and Claude Code treats that injected text as a command instead of as data. It is the confused-deputy problem applied to coding agents.
Evolving Lite treats this as a security tier. The PreToolUse hook hooks/scripts/security-tier-check.py classifies every bash command against a 10-tier system defined in hooks/security-tiers.json. Tier 7 is prompt injection, and its action is a hard BLOCK on patterns like:
"ignore previous instructions"
"new instructions:"
"you are now a"
"jailbreak"
The tiering matters because not every risk deserves a block. The same hook BLOCKs the dangerous tiers (10 through 7), asks for confirmation on tiers 6 and 5, warns on 4 and 3, and silently logs 1 and 2. A blunt allow-or-deny filter trains you to click through warnings; a graded one keeps the blocks rare enough to mean something.
The root cause of context poisoning is treating observed content as a command. A pattern-level block at the tool boundary is a backstop, not a substitute for the agent keeping a clear line between what it reads and what it is told to do.
This is also why the official Claude Code security model leans on permission boundaries - the hook layer adds a deterministic second line of defense you fully control.
Failure Mode 6: Prompt Bloat
Prompt bloat is when Claude Code loads every rule, every memory, and every piece of project knowledge into the context on every single turn. The agent technically has everything, but the relevant 5% is buried, latency climbs, and the cost per turn balloons.
The fix is just-in-time loading through a context router. Evolving Lite keeps 25 keyword-scoped routes in _graph/cache/context-router.json, and only loads the knowledge a turn actually needs. Here is the real delegation route:
"delegation": {
"keywords": ["delegate", "delegation", "sub-agent", "subagent", "explore", "search codebase", "find all", "research"],
"primary_nodes": ["knowledge/rules/delegation.md"],
"patterns": ["delegation-request"]
}
The orchestration rule knowledge/rules/metacognitive-orchestrator.md states the payoff directly: matching keywords and loading only matched routes keeps context "small (~5K tokens baseline instead of loading all knowledge)." Recall is capped too - the thinking-recall.py hook injects at most 2 stored experiences per tool call, so memory helps without flooding.
25 routes like the one above live in the public repo, ready to clone. The principle is the one Anthropic keeps returning to in its context engineering guidance: load context just in time, not just in case.
How Do You Audit Your Own Setup?
Auditing for Claude Code failure modes means checking, for each mode, whether you have a deterministic mechanism that catches it - not whether you "usually remember." Run this quick self-check against your current setup before adding anything new.
| Failure mode | Do you have a fix? | Mechanism to look for |
|---|---|---|
| Compaction drift | Yes / No | PreCompact extraction + context budget warning |
| Over-delegation | Yes / No | A delegation score with negative weights |
| Stale memory | Yes / No | Age + access-based auto-archival |
| Hook noise | Yes / No | Debounce + tier gates on hooks |
| Context poisoning | Yes / No | Prompt-injection block at the tool boundary |
| Prompt bloat | Yes / No | Keyword-scoped context router |
Every "No" is a place where you are currently relying on luck. The fixes do not have to be these exact scripts - but the shape (detect the moment, then rescue or block) should be present for each one. If you want a tested baseline, the whole set ships together in primeline-ai/evolving-lite.
The deeper point is that these failure modes compound. Stale memory feeds prompt bloat, prompt bloat accelerates compaction, compaction drift triggers re-work that burns tokens through over-delegation. Fixing one mode reduces pressure on the others, which is why I treat them as one system rather than six unrelated patches - the same systems-over-sessions idea behind self-correcting Claude Code workflows.
Getting Started: Fix One Failure Mode This Week
Start with the failure mode that costs you the most right now. For most people running long Claude Code sessions, that is compaction drift, because it silently erases hours of context. Pick one, install the fix, and watch it for a week before adding the next.
The fastest path is to clone the whole set and let the tier system roll the hooks in gradually:
git clone https://github.com/primeline-ai/evolving-lite
It installs as a Claude Code plugin and starts in Tier 1, so only the 4 core hooks fire on day one. As your session count climbs, correction detection, delegation scoring, recall, and archival switch on automatically. Nothing to configure - the failure-mode fixes arrive in the order you are likely to need them.
If you would rather build your own, use the audit table above as a checklist and write one hook per failure mode. The Claude Code hooks reference covers the event types, and the Evolving Lite hooks directory gives you working scripts to adapt. Either way, the goal is the same: stop relying on memory for the things that break the same way every time.
![6 Claude Code Failure Modes and How to Fix Each [2026]](/_next/image?url=%2Fblog%2Fclaude-code-failure-modes-hero.webp&w=3840&q=75)


