>_

Claude Code Context: Knowledge Graphs, Smart Routing, No Waste

Robin||5 min
Last updated: February 17, 2026
contextclaude-codeknowledge-graphoptimization
Claude Code Context: Knowledge Graphs, Smart Routing, No Waste
Listen to this article (5 min)

Most People Waste 80% of Their Context Window

Claude Code's context window is finite. Most people waste it. They load everything at session start - every rule, every pattern, every piece of documentation. By the time they hit 70% context, the AI starts degrading. Compaction kicks in. Signal quality drops. The session becomes useless.

I built a different system. One that loads 2,000 tokens at session start and 25,000 only when needed. The result? Sessions that stay sharp from start to finish. No degradation. No compaction. Just consistent performance.

Want the fundamentals first? Grab the free 3-pattern guide - it covers Memory, Delegation, and Knowledge Graph basics. This post explains the context architecture behind it all.

The Context Window Problem

When you start a Claude Code session, it needs context. Who are you? What is your project? What are your conventions? Most people dump everything into .claude/rules/ and call it done.

The problem: Claude Code recursively loads every markdown file under .claude/rules/. I had 44 rule files. Each one 500-2000 words. That is roughly 30,000 tokens loaded before the first message. My context budget was already 15% consumed before I said hello.

The math did not work. If I started at 15% and my average task consumed 20% context, I could only do 3-4 tasks per session. Then I would need /clear and start over.

The real solution was not removing rules. It was loading them smarter.

The 3-Layer Architecture

I built a context system with three layers:

  1. Context Router - loads rules based on what you are actually doing
  2. Budget Awareness - tracks token usage and adjusts behavior automatically
  3. Knowledge Graph - maps connections between all system components

Each layer solves a different problem. Together they create a system where context is always available but never wasteful.

context-architecture
Knowledge GraphSemantic connections
Budget AwarenessToken tracking + thresholds
Context RouterKeyword-based rule loading

Layer 1: Context Router

The context router maps keywords to rule files. When I mention "delegation", it loads delegation rules. When I mention "debug", it loads debugging patterns. When I mention neither? It loads nothing.

The key insight: rules moved from .claude/rules/ (always loaded) to a separate knowledge directory (loaded on demand). Only 4 core rules stay always-loaded - the session-start ritual, core principles, workflow detection, and the system README.

Session start cost dropped from ~30K tokens to ~2K tokens. A 93% reduction. The rules are still available - just loaded when keywords match instead of loaded always.

Layer 2: Budget Awareness

The context router solves what to load. Budget awareness solves when to stop loading.

The system defines behavior thresholds based on context usage. Under 60%, everything works normally - load full rules when requested. Between 60-80%, load summaries instead of full documentation. Above 80%, stop loading new rules entirely and prepare for a clean handoff.

This creates emergent behavior. When Claude sees it is at 73% context with 5 tasks remaining, it reasons: "Sequential execution would push context to 95%. Better to delegate 3 tasks to sub-agents, handoff, and handle the rest in a fresh session."

That is not hardcoded logic. It is Claude reasoning about its own constraints using the thresholds as heuristics. The same way a human engineer says "I am tired, let us finish this tomorrow."

Layer 3: Knowledge Graph

The context router answers "what should I load?" Budget awareness answers "when should I load it?" The knowledge graph answers "what else is connected?"

The graph maps relationships between all system components - agents, commands, rules, patterns. When I load delegation rules, the graph surfaces related context: budget awareness (because delegation affects context usage) and the trait system (because delegation uses traits).

I do not have to remember what connects to what. The graph remembers for me. And as the system grows, the graph gets smarter.

This matters at scale. When I had 20 rules, I could remember them all. At 44 rules, I needed the router. At 400+ components (agents, commands, patterns, rules), I need the graph to surface connections I did not explicitly design.

The Real-World Impact

Before: Load Everything
  • -15% context consumed at start
  • -3-4 tasks before degradation
  • -Frequent compaction and signal loss
  • -Rules: all-or-nothing
After: Smart Loading
  • +3% context consumed at start
  • +10+ tasks with consistent quality
  • +Degradation only at 85%+
  • +Rules load on demand

The difference is not just tokens saved. It is session consistency. When you start at 3% instead of 15%, you have headroom for complex tasks. When rules load on demand, you never pay for context you do not use. When budget awareness triggers at 70%, you handoff before degradation starts.

No more "Claude is being weird" at 80% context. No more mysterious compaction eating your history. Just consistent performance.

Why This Matters for You

Even if you only have 5-10 rule files today, the pattern applies. The moment you notice Claude getting confused or forgetful mid-session, you are hitting context limits. The fix is not fewer rules. It is smarter loading.

Start by separating your rules into "always needed" and "sometimes needed." That single step can save you 50%+ of your context budget. Your CLAUDE.md template determines which rules load at startup versus on demand.

The free 3-pattern guide covers the knowledge graph pattern at concept level. The course includes the complete router configuration, graph setup, budget thresholds, and the tiered loading system - everything you need to implement this in your own Claude Code setup.

Small systems. Compounding returns. That is the PrimeLine way.

FAQ

Isn't this over-engineered for a solo developer?+
For a project with 10 rules, yes. But the moment you hit 20+ rules or notice Claude getting confused mid-session, you need smarter loading. Start simple with tiered rules, then add the router and graph as you scale.
How do I know when a rule should always load vs. load on demand?+
If Claude needs it to function every session (memory bootup, core principles), it always loads. If it is only relevant for specific tasks (debugging patterns, security checks), it loads on demand. The heuristic: would you want this loaded when doing unrelated work?
What about the performance cost of keyword matching?+
Negligible. Matching keywords against routes takes milliseconds. The token savings - 25K per session in my case - far outweigh any lookup cost.
Does this work with other AI coding tools?+
The concepts - tiered rules, keyword routing, budget awareness - are universal. The implementation is Claude Code specific. But any tool with context windows and rule systems could adapt this architecture.

>_ Get the free Claude Code guide

>_ No spam. Unsubscribe anytime.

>_ Related