I've been running Claude Code sessions that span 30+ tool calls - editing configs, running tests, analyzing logs. Around the 15-tool mark, I noticed something: Claude would "forget" patterns I'd mentioned earlier. Not hallucinate, just drift away from established context.
The problem wasn't token limits. It was attention drift. Claude's working memory is excellent, but in long sessions with constant tool results, relevant context from 10 minutes ago gets buried.
So I built a hook that fixes this automatically.
The Problem: Context Drift in Long Sessions
Here's what I observed across dozens of multi-hour sessions:
- Tool call 5: Claude remembers the delegation rules perfectly
- Tool call 15: Claude asks about delegation again
- Tool call 25: Claude suggests an approach I'd explicitly rejected earlier
The pattern was consistent. After 8-10 tool calls, context from the session start would fade. Not disappear - just become less salient than the immediate tool results.
- -Context drift after 15+ tool calls
- -Manual reminders every few interactions
- -Repeated explanations of established patterns
- -Lost thread of conversation in long sessions
- +Automatic context refresh after 8 tool calls
- +91-route knowledge base auto-matched
- +Relevant patterns injected mid-session
- +37ms overhead per trigger
The Solution: Thinking Recall Hook
I built a PreToolUse hook that runs after every 8th tool call. Here's what it does:
- Reads the transcript back 4-8 exchanges
- Extracts keywords from Claude's thinking blocks
- Matches against a 91-route context router
- Injects up to 500 tokens of relevant context
- Returns in under 40ms
The key insight: Claude's <thinking> blocks contain the clearest signal of what's cognitively active. If Claude is thinking about "delegation" and "model selection", those keywords reveal what context would be valuable right now.
Implementation Details (Concept Level)
The hook is 370 lines of Python, stdlib only. No dependencies, no network calls, no external services. It has to be fast - anything over 200ms would feel like lag.
What Gets Analyzed
I only read assistant and thinking blocks. Tool results are ignored for security - I don't want the hook to amplify potential prompt injections from web scraping or file reads.
The keyword extraction is simple: frequency analysis with stopword filtering, weighted by recency. Recent thinking blocks count more than older ones.
What Gets Matched
Three knowledge sources:
context-router.json: 91 routes mapping keywords to rule filesexperience-router.json: Past session learnings with similarity matching_memory/projects/*.json: Active project state and goals
The router uses keyword set intersection - if 2+ keywords from a route match what Claude is thinking about, that route's context files get injected.
What Gets Injected
Up to 500 tokens, deduplicated across three dimensions:
- Content hashing (exact duplicates)
- Jaccard similarity (paraphrases)
- Topic-shift detection (is this genuinely new information?)
The hook passes context via the additionalContext field in the PreToolUse response. Claude sees it as "additional context for this tool call" - natural, non-intrusive.
Performance Numbers
- Median execution time: 37ms
- 95th percentile: 45ms
- Target was 200ms
- Fail-open: any error exits silently with code 0
The speed comes from aggressive caching. The context router is loaded once at startup and held in memory. File reads are only triggered on cache misses.
Real-World Example
Yesterday I was debugging a delegation issue. The session went like this:
- Tool calls 1-7: Setting up test cases, reading config files
- Tool call 8: Hook fires, injects delegation scoring rules
- Tool call 9: Claude references the scoring rules correctly
- Tool calls 10-16: Implementation and testing
- Tool call 17: Hook fires again, injects model selection criteria
- Tool call 18: Claude applies criteria without me prompting
I didn't mention delegation scoring after tool call 3. But at tool call 9, Claude had it available again - not because of its memory, but because the hook detected "delegation" in the thinking blocks and injected the right context.
Lessons Learned
Fail-open is critical. My first version would block the tool call if the hook crashed. Bad idea. Now any error - file not found, JSON parse failure, timeout - results in exit 0. Claude never sees the failure.
Keyword extraction beats embeddings here. I tried semantic similarity first. Too slow (150ms) and too many false positives. Simple keyword matching with frequency weighting works better for this use case.
Deduplication saved me. Without it, the hook would inject the same 3 rules every time. Jaccard similarity (comparing word sets) catches paraphrases that content hashing misses.
The 8-call threshold was trial and error. Too frequent (every 5 calls) felt noisy. Too rare (every 15 calls) came too late. Eight is the sweet spot for my workflows.
Why This Matters for Claude Code Workflows
Most people treat Claude Code like a traditional IDE extension - a tool that responds to requests. But with hooks, you can build self-correcting systems that adapt mid-session.
This particular hook solves context drift. But the pattern generalizes:
- PreToolUse hooks can validate inputs before risky operations
- PostToolUse hooks can verify outputs and trigger rollbacks
- Hooks can coordinate between multiple agents in parallel workflows
The broader insight: Claude Code workflows don't have to be linear. You can build feedback loops, validation layers, and self-correction mechanisms that run automatically.
If you want to dive deeper into hook architecture, keyword extraction strategies, and the complete scoring algorithms, that's covered in Claude Code Mastery. The course includes the full Python implementation, config examples, and advanced patterns for multi-agent coordination.
Get Started
Want to see how this fits into a complete workflow automation system? I've written about hooks automation patterns and context management strategies that complement this approach.



