CLAUDE.md is where most Claude Code users start - one file, always loaded, all instructions in one place. For a while that works fine. Then you hit 300 lines and notice Claude getting sloppy near the end of complex sessions. The file got too big for the context window to absorb cleanly.
The fix isn't a better template. It's a different architecture.
Claude Code's .claude/rules/ directory lets you split instructions across multiple files. Pair that with a second tier of on-demand rules, and you control exactly what gets loaded when. I run 59 rules across two tiers. The result: tighter context, better behavior, and instructions that actually apply to the task at hand.
CLAUDE.md = project identity (stack, paths, constraints). .claude/rules/ = behavioral defaults (auto-loaded, one concern per file). knowledge/rules/ = deep guides (loaded on demand). Split your monolithic CLAUDE.md into this 2-tier system and cut context waste by ~73%.
Not sure what belongs in CLAUDE.md at all? Start with 5 real CLAUDE.md templates - minimal, agent-heavy, and monorepo patterns you can copy today.
What's the Difference Between CLAUDE.md and Rules?
CLAUDE.md loads automatically every session, no exceptions. Rules files in .claude/rules/ also auto-load every session - but they're separate files, one concern per file, easier to maintain than one monolith.
The real power is a second tier: rules you store outside .claude/rules/ (I use knowledge/rules/) and load on demand. Only the instructions relevant to what you're doing right now get injected into context.
Why Does CLAUDE.md Get Bloated?
Every session you find something Claude gets wrong, so you add a line. Over weeks, that file becomes a graveyard of edge cases, tool routing decisions, reasoning frameworks, and session rituals - most of which don't apply to any single session.
The thing is, Claude Code reads your entire CLAUDE.md every time it processes a message. Bloated CLAUDE.md means wasted context on instructions that don't apply, which crowds out the actual code and task context that matters.
The pattern I see constantly: people dump everything into CLAUDE.md because it's the only file they know about. Rules solve this.
What Goes in .claude/rules/?
Think of .claude/rules/ as behavioral defaults - instructions that apply across most sessions but are specific enough to warrant their own file.
My auto-loaded rules cover:
- delegation.md - when to route tasks to sub-agents and which model to use
- workflow-detection.md - how to detect slash commands from natural language input
- mid-session-kairn.md - when to save learnings to the knowledge graph during a session
- firecrawl-routing.md - tool routing decisions for web scraping tasks
- quick-dsv.md - a 3-question reasoning check before acting on open-ended requests
Each file is focused. delegation.md doesn't know about firecrawl-routing.md. They don't need to. Claude loads all of them, but each file handles one concern cleanly.
Want the full system blueprint? Get the free 3-pattern guide.
What Goes in knowledge/rules/ (On-Demand)?
On-demand rules are the deep stuff. Full framework documentation, edge case handling, extended examples. You reference them explicitly when you need them:
# delegation.md
Full framework: `knowledge/rules/delegation/delegation-extended.md`
When delegation gets complex, the auto-loaded rule says "load the extended guide." Otherwise that 3,000-token document never enters context.
My on-demand rules include things like:
- Full memory system bootup ritual (only needed at session start)
- Complete agent trait library (only needed when building new agent profiles)
- DSV reasoning deep-dive (only needed for architecture decisions)
The auto-loaded rule gives Claude a summary and a pointer. The on-demand rule gives the full detail when it's actually needed.
- -500+ lines loaded every single session
- -Delegation logic mixed with coding style
- -Tool routing buried after 200 lines of setup
- -Context wasted on irrelevant instructions
- -Hard to update one rule without breaking another
- +CLAUDE.md: ~80 lines of core identity only
- +Auto-rules: focused behavioral defaults
- +On-demand rules: deep guides, loaded when needed
- +~73% context reduction vs loading everything
- +Update one rule without touching the others
How Do I Structure a Rules File?
Keep each rule file under 200 lines. Start with a trigger and priority so Claude knows when to apply it:
# firecrawl-routing.md
**Trigger**: Before any firecrawl_* tool call.
**Priority**: HIGH
## Quick Decision
Single URL + specific data -> scrape with JSON schema
Auth-walled sites (X, LinkedIn) -> use claude-in-chrome instead
Complex multi-step -> firecrawl_agent (last resort)
## Common Mistakes
- Using markdown for data extraction (use JSON schema)
- Scraping auth-walled sites (use alternatives above)
That's the whole file. Under 30 lines. Claude reads it, internalizes the routing logic, and applies it whenever firecrawl comes up. No fluff.
The correction-to-rule pipeline can automate creating rules like this - every time you correct Claude, a hook generates a candidate rule automatically.
Does This Actually Save Context?
Yes, measurably. My CLAUDE.md is 80 lines. My 20 auto-loaded rules average 60 lines each. That's 1,200 lines of behavioral instructions active per session.
Without the 2-tier system, I'd need all 59 rules loaded every time. At an average of 80 lines each, that's 4,720 lines - roughly 4x the context. The on-demand tier handles the rest only when needed.
The context management post goes deeper on why context saturation kills Claude's performance on long tasks. Rules modularity is one of the most direct fixes.
How Do Rules Interact With Hooks?
Rules describe what to do. Hooks enforce it.
My delegation.md rule says "delegate when score >= 3." A delegation-enforcer.py hook fires on every task submission and checks whether delegation should have happened. If Claude tried to do something itself that should have been delegated, the hook flags it.
Rules are instructions. Hooks are verification. Together they create behavior that actually sticks across 4,200+ sessions - not just for one conversation.
Write the rule first: "delegate when score >= 3." Then build the hook that enforces it. The rule teaches Claude the behavior. The hook catches drift. Without enforcement, rules decay after 10-15 tool calls in long sessions.
What Shouldn't Go in Rules?
Rules work for behavioral instructions. They're the wrong place for:
- Project-specific context: file locations, stack details, command lists - that belongs in CLAUDE.md
- Knowledge: facts, reference material, domain background - use the knowledge graph system instead
- Task memory: what you were doing last session - that's the memory system's job
Rules answer "how should Claude behave?" not "what is this project?" or "what happened last time?"



