>_

6 Claude Code Failure Modes and How to Fix Each [2026]

Robin||11 min
claude-codehookscontext-managementdelegationmemory
6 Claude Code Failure Modes and How to Fix Each [2026]
Listen to this article (11 min)

Why Claude Code Fails in Predictable Ways

Claude Code failure modes are not random. The same few problems show up in almost every long session: context gets compacted and the agent forgets what it was doing, it spawns a subagent for a task it should have done inline, or it follows stale notes from a project you finished weeks ago. Once you have seen them a few times, they become predictable - which means they become fixable.

I spent six months running an increasingly large Claude Code setup, and every painful day traced back to one of a handful of root causes. Instead of babysitting each session, I turned each failure mode into a hook or a rule that catches it automatically. That collection became Evolving Lite, an open-source Claude Code plugin with 10 hooks, 5 rules, and a context router.

This post walks through six concrete Claude Code failure modes and the exact mechanism that fixes each. Every file path, threshold, and code block below comes straight from the plugin, so you can clone it and read the real implementation.

TL;DR: Six Claude Code failure modes cause most long-session pain: compaction drift, over-delegation, stale memory, hook noise, context poisoning, and prompt bloat. Each has a deterministic fix - a PreCompact extractor, a delegation score, an auto-archiver, debounce gates, a prompt-injection blocker, and a keyword-scoped context router.

The pattern behind every fix

A failure mode is just a moment where the agent loses information or wastes it. The fix is always the same shape: detect the moment with a hook, then either rescue the information or block the waste before it happens.

Failure Mode 1: Compaction Drift Wipes Context

Compaction drift happens when Claude Code compresses a long conversation to free up context, and the summary drops the decisions, file paths, and open loops you actually needed. The agent keeps working but on a thinner picture of the task, so it repeats steps or contradicts earlier choices.

The fix has two layers. First, a PreToolUse hook watches the context budget and warns before you ever hit the wall. In Evolving Lite this lives in hooks/scripts/context-warning.sh with three thresholds:

code
THRESH_WARN=70
THRESH_AUTO=93
THRESH_RESET=50
DEBOUNCE_SECONDS=120

At 70% it warns. At 93% it injects an auto-handoff instruction so Claude writes a handoff document before compaction wipes the working context:

code
echo "{\"systemMessage\": \"AUTO-HANDOFF: Context at ${pct}%. Create a handoff document with /whats-next, then continue working. After compaction, context resets and you can keep going.\", \"continue\": true}"

The second layer is a PreCompact hook, hooks/scripts/precompact-extract.py, that scans the transcript for decision, solution, and pattern markers and saves up to 5 of them as experience files before compaction runs. The knowledge survives the wipe and gets recalled later. This is the same lifecycle I cover in depth in Claude Code hooks and context management for Claude Code.

Without a PreCompact hook
  • -Compaction summarizes the chat
  • -Decisions and file paths get dropped
  • -Agent repeats or contradicts earlier work
  • -You re-explain context manually
With precompact-extract.py
  • +Transcript scanned before compaction
  • +Up to 5 decisions saved as experiences
  • +Knowledge recalled in later turns
  • +Session continues on solid ground

Failure Mode 2: Over-Delegation Burns Tokens

Over-delegation is when Claude Code spawns a subagent for work it should have done inline. Subagents are powerful, but each one is a fresh context window with setup cost, and a session that delegates everything is slower and more expensive than one that delegates selectively.

The fix is a scoring gate, not a vibe. A UserPromptSubmit hook, hooks/scripts/delegation-enforcer.py, scores every prompt against the table in knowledge/rules/delegation.md and only suggests a subagent when the score clears a threshold of 3:

code
Delegate automatically when delegation score >= 3.

| Factor | Points |
|--------|--------|
| Scope > 2 files | +2 |
| Bulk operation | +2 |
| Research/learn | +2 |
| Code review | +2 |
| Exploration/search | +3 |
| Independent task | +2 |
| Critical keywords (production, deploy, password, secret) | -10 |
| User wants to see ("show me", "explain") | -5 |

The negative weights matter as much as the positive ones. A prompt that mentions production or secrets drops 10 points, so it never auto-delegates. A "show me" or "explain" request drops 5, because you want to watch that work, not hand it off. The model table caps complexity 7+ as "Don't delegate" - hard reasoning stays with the main agent.

3Score to delegate
-10Critical-keyword penalty
7+Complexity = handle yourself

I broke down the full scoring model and how it routes to Haiku or Sonnet in score-based auto-delegation for Claude Code. The point here is narrow: a number, not a guess, decides when a subagent is worth it.

This lives in primeline-ai/evolving-lite - the self-evolving Claude Code plugin. Free, MIT, no build step.

Failure Mode 3: Stale Project Memory

Stale project memory is when Claude Code reads old notes, summaries, or experience files from work you have already finished, and treats them as current. The agent confidently heads down a path that was correct three weeks ago and wrong today. Memory that never expires is memory that lies.

Evolving Lite handles this with an auto-archiver, hooks/scripts/auto-archival.py, that runs on Stop (Tier 3, once per 24 hours) and retires experiences that are both old and unused:

code
EXP_MAX_AGE_DAYS = 90
SESSION_MAX_AGE_DAYS = 30
MIN_ACCESS_FOR_KEEP = 2
MIN_RELEVANCE_FOR_KEEP = 30

An experience older than 90 days with an access count under 2 and effective relevance under 30 gets archived, not deleted. Sessions older than 30 days are archived too. What stays is what you actually keep using.

The other half is re-orientation at the start of every session. The rule knowledge/rules/domain-memory-bootup.md tells Claude to read the active project file first and pick exactly one task:

code
### 1. Read Memory

1. ${CLAUDE_PLUGIN_ROOT}/_memory/index.json -> Active project, last session
2. ${CLAUDE_PLUGIN_ROOT}/_memory/projects/{active}.json -> Goals, Progress, Next Step

### 4. Pick One Task

Choose ONE task. Not multiple. Atomic progress.
Archive, don't delete

Auto-archival moves stale files out of the recall path instead of deleting them. You keep an audit trail, but the agent only sees memory that is still earning its place. Full memory architecture in the Claude Code memory deep dive.

For semantic recall instead of keyword matching, the plugin's README notes the difference plainly: keyword-based memory matching lands around 60% recall, while a graph layer like Kairn pushes it toward 90%. I cover the persistent layer in the Claude Code memory system post.

Failure Mode 4: Hook Noise and Tool Spam

Hook noise is the opposite failure of having no hooks: every hook fires on every event, floods the context with status messages, and the signal drowns. Tool spam is the same problem from the model side - repeated, redundant tool calls that add tokens without adding information.

The fix is restraint built into the hook layer. Evolving Lite gates 7 of its 10 hooks behind session tiers, so a brand-new session (Tier 1) only runs the 4 essential hooks. Memory recall and archival do not fire until later sessions, when there is actually something to recall.

Tier-gated hook activation
Tier 1 (session 1-2): 4 core hooks only
v
Tier 2 (session 3+): correction + delegation + summary
v
Tier 3 (session 10+): recall + precompact + archival

On top of tiering, individual hooks debounce themselves. The context warner uses DEBOUNCE_SECONDS=120 so it cannot warn twice in two minutes, and the session summarizer (hooks/scripts/session-summary.sh) has a 30-minute anti-spam gate so a stop-start workflow does not generate a pile of near-identical summaries. Every hook also carries a 10-second timeout, so a slow script can never stall your session.

The lesson generalizes beyond this plugin: a hook that fires too often is worse than no hook, because it trains you to ignore it. Design for the quietest useful signal.

Failure Mode 5: Context Poisoning

Context poisoning is when untrusted text - a file you read, a web page, a tool result - contains instructions aimed at the agent, and Claude Code treats that injected text as a command instead of as data. It is the confused-deputy problem applied to coding agents.

Evolving Lite treats this as a security tier. The PreToolUse hook hooks/scripts/security-tier-check.py classifies every bash command against a 10-tier system defined in hooks/security-tiers.json. Tier 7 is prompt injection, and its action is a hard BLOCK on patterns like:

code
"ignore previous instructions"
"new instructions:"
"you are now a"
"jailbreak"

The tiering matters because not every risk deserves a block. The same hook BLOCKs the dangerous tiers (10 through 7), asks for confirmation on tiers 6 and 5, warns on 4 and 3, and silently logs 1 and 2. A blunt allow-or-deny filter trains you to click through warnings; a graded one keeps the blocks rare enough to mean something.

Data is not instructions

The root cause of context poisoning is treating observed content as a command. A pattern-level block at the tool boundary is a backstop, not a substitute for the agent keeping a clear line between what it reads and what it is told to do.

This is also why the official Claude Code security model leans on permission boundaries - the hook layer adds a deterministic second line of defense you fully control.

Failure Mode 6: Prompt Bloat

Prompt bloat is when Claude Code loads every rule, every memory, and every piece of project knowledge into the context on every single turn. The agent technically has everything, but the relevant 5% is buried, latency climbs, and the cost per turn balloons.

The fix is just-in-time loading through a context router. Evolving Lite keeps 25 keyword-scoped routes in _graph/cache/context-router.json, and only loads the knowledge a turn actually needs. Here is the real delegation route:

code
"delegation": {
  "keywords": ["delegate", "delegation", "sub-agent", "subagent", "explore", "search codebase", "find all", "research"],
  "primary_nodes": ["knowledge/rules/delegation.md"],
  "patterns": ["delegation-request"]
}

The orchestration rule knowledge/rules/metacognitive-orchestrator.md states the payoff directly: matching keywords and loading only matched routes keeps context "small (~5K tokens baseline instead of loading all knowledge)." Recall is capped too - the thinking-recall.py hook injects at most 2 stored experiences per tool call, so memory helps without flooding.

Just-in-time context, not everything-always
RecallMax 2 relevant experiences injected per call
RoutedOnly knowledge matching the prompt keywords
Baseline~5K tokens loaded every session

25 routes like the one above live in the public repo, ready to clone. The principle is the one Anthropic keeps returning to in its context engineering guidance: load context just in time, not just in case.

How Do You Audit Your Own Setup?

Auditing for Claude Code failure modes means checking, for each mode, whether you have a deterministic mechanism that catches it - not whether you "usually remember." Run this quick self-check against your current setup before adding anything new.

Failure modeDo you have a fix?Mechanism to look for
Compaction driftYes / NoPreCompact extraction + context budget warning
Over-delegationYes / NoA delegation score with negative weights
Stale memoryYes / NoAge + access-based auto-archival
Hook noiseYes / NoDebounce + tier gates on hooks
Context poisoningYes / NoPrompt-injection block at the tool boundary
Prompt bloatYes / NoKeyword-scoped context router

Every "No" is a place where you are currently relying on luck. The fixes do not have to be these exact scripts - but the shape (detect the moment, then rescue or block) should be present for each one. If you want a tested baseline, the whole set ships together in primeline-ai/evolving-lite.

The deeper point is that these failure modes compound. Stale memory feeds prompt bloat, prompt bloat accelerates compaction, compaction drift triggers re-work that burns tokens through over-delegation. Fixing one mode reduces pressure on the others, which is why I treat them as one system rather than six unrelated patches - the same systems-over-sessions idea behind self-correcting Claude Code workflows.

Getting Started: Fix One Failure Mode This Week

Start with the failure mode that costs you the most right now. For most people running long Claude Code sessions, that is compaction drift, because it silently erases hours of context. Pick one, install the fix, and watch it for a week before adding the next.

The fastest path is to clone the whole set and let the tier system roll the hooks in gradually:

code
git clone https://github.com/primeline-ai/evolving-lite

It installs as a Claude Code plugin and starts in Tier 1, so only the 4 core hooks fire on day one. As your session count climbs, correction detection, delegation scoring, recall, and archival switch on automatically. Nothing to configure - the failure-mode fixes arrive in the order you are likely to need them.

If you would rather build your own, use the audit table above as a checklist and write one hook per failure mode. The Claude Code hooks reference covers the event types, and the Evolving Lite hooks directory gives you working scripts to adapt. Either way, the goal is the same: stop relying on memory for the things that break the same way every time.

FAQ

What are the most common Claude Code failure modes?+
The most common Claude Code failure modes are compaction drift, over-delegation, stale project memory, hook noise, context poisoning, and prompt bloat. Each one either loses information or wastes it during long sessions, and each has a deterministic hook-based fix.
What is compaction drift in Claude Code?+
Compaction drift is when Claude Code compresses a long conversation and the summary drops decisions, file paths, or open loops you still needed. The agent keeps working on a thinner picture and repeats or contradicts earlier work. A PreCompact hook that saves key decisions before compaction prevents it.
How do I stop Claude Code from over-delegating to subagents?+
Use a delegation score instead of a vibe. Evolving Lite scores each prompt against weighted factors and only suggests a subagent at a score of 3 or higher, with negative weights for production, secrets, and 'show me' requests so risky or watch-me work never auto-delegates.
How does Claude Code handle stale project memory?+
An auto-archival hook retires experiences older than 90 days with low access and low relevance, and sessions older than 30 days, moving them out of the recall path. A session-start rule re-reads the active project file so Claude orients on current goals, not old notes.
Can Claude Code be tricked by prompt injection in files it reads?+
Yes, this is context poisoning - untrusted text containing instructions gets treated as a command. A PreToolUse security hook can block injection patterns like 'ignore previous instructions' at the tool boundary as a deterministic second line of defense behind the agent's own data-versus-instructions discipline.
What is prompt bloat and how do you fix it?+
Prompt bloat is loading every rule and memory on every turn, burying the relevant 5% and inflating cost. A context router with keyword-scoped routes loads only the knowledge a turn needs, keeping a small baseline (~5K tokens) instead of everything at once.
Do hooks slow down Claude Code?+
Not meaningfully when bounded. Evolving Lite gives every hook a 10-second timeout, debounces noisy hooks, and gates 7 of 10 hooks behind session tiers, so a new session only runs 4 core hooks. The design favors the quietest useful signal over firing on every event.
Where can I get working code for these Claude Code fixes?+
All six fixes ship together in the open-source Evolving Lite plugin at github.com/primeline-ai/evolving-lite. It includes 10 hooks, 5 rules, a 25-route context router, and 12 patterns, and installs as a Claude Code plugin that activates hooks gradually by session tier.

>_ Get the free Claude Code guide

>_ No spam. Unsubscribe anytime.

>_ Related