Every time Claude Code needed to search, review, or explore something, I had to make a call: handle it myself, or spin up a sub-agent? If sub-agent - which one? Which model? How complex is this task really?
That's five decisions before any actual work happens. Multiplied across a full session, the mental overhead adds up fast. And the decisions aren't free - getting them wrong either burns expensive Opus tokens on tasks Haiku could handle, or accidentally routes critical operations to an underpowered model.
I decided to make this a math problem instead of a judgment call.
Want the foundational patterns first? The free 3-pattern guide covers memory, delegation, and knowledge graphs at concept level.
The Problem: Manual Delegation Doesn't Scale
When I built out my first few agent types, manual routing was manageable. I knew the agents, I could roughly estimate complexity, the overhead was tolerable.
Then I had 50+ agent types.
At that scale, the decision tree becomes impossible to hold in working memory. Most developers hit one of two failure modes. The first: never delegating. Every task stays with the main agent, which means paying Opus rates for searches, explorations, and reviews that Haiku could handle at a fraction of the cost. The second: always delegating. Everything gets routed out, including the tasks where you actually need the main agent's full context and reasoning - critical deployments, sensitive configuration changes, complex architectural decisions that require careful judgment.
Neither extreme is right. What I needed was a system that could reliably distinguish between the two - without my involvement.
The Solution: A Scoring Formula for Claude Code Delegation
The core idea is straightforward: task characteristics map to points, and points determine whether to delegate and which model to use.
Each incoming message gets analyzed for characteristics. Some characteristics add points - the task involves searching across a codebase, or it's clearly independent from the current context, or it's a research question. Some characteristics subtract points - the message contains critical operation keywords, or the user is asking for an explanation rather than execution.
When the score crosses a threshold, delegation happens automatically. When it doesn't, the task stays with the main agent. No judgment call. No five decisions.
The threshold sits at three points. Below three: stay with Opus. At or above three: delegate. The number isn't arbitrary - it's calibrated to let a single strong delegation signal trigger automatically (exploration keywords score high enough alone), while requiring multiple weaker signals to combine before delegation fires.
Safety penalties are the critical design choice here. Certain keywords - deploy, production, payment, password - carry penalties large enough to override almost any combination of positive factors. A task can have every delegation signal firing, but if it touches critical operations, the penalty drives the score deep into negative territory. The formula can't route those tasks out accidentally.
A Concrete Example
User message: "search the codebase for all authentication patterns."
The hook reads this and scores it. Exploration and search are high-value delegation signals - they score high enough to cross the threshold on their own. The task is clearly independent from whatever else is in the session. The score lands well above three.
Model selection happens next. Complexity falls in the mid range - this is a structured search, not an architectural decision. That maps to Sonnet. Delegation fires, Sonnet handles the search, result comes back.
Now compare that to: "deploy the payment system."
The hook scores this too. There's an independent task signal, which adds points. But "deploy" and "payment" are both critical operation keywords. The penalties are aggressive by design. The total score goes sharply negative. The task stays with the main Opus agent, which has full session context and appropriate caution for irreversible operations.
Same formula, opposite outcomes. The difference isn't a rule I wrote about deployment - it's the penalty system making the math work correctly.
The Result
The before state was five decisions per task. Operator overhead on top of every piece of actual work.
The after state is zero decisions. The formula runs on every message in the background. Tasks that should be delegated get delegated. Tasks that should stay with Opus stay with Opus. Model selection follows from complexity scoring. I never think about it.
The cost impact is real. Haiku costs a fraction of Opus per token. When the system automatically routes simple searches and exploration tasks to Haiku, those tasks are both faster and cheaper - without any quality tradeoff for that class of work. The savings compound across a full session.
There's also something no competing framework does here. CrewAI uses role-based routing - you assign roles to agents manually, and tasks go to whoever has the matching role. LangGraph uses explicit state machines - you build graphs of nodes and edges that define legal transitions. Both approaches require upfront design work and don't adapt based on what the task actually is.
This system uses quantitative scoring. The task itself determines where it goes. No predefined roles. No explicit graph. Just arithmetic on task characteristics.
- -5 decisions per task (delegate? which agent? which model?)
- -Opus tokens burned on simple searches
- -Critical tasks accidentally sent to Haiku
- -No learning from delegation patterns
- +Zero manual decisions - formula handles routing
- +Haiku handles searches, Opus handles architecture
- +Safety keywords block delegation of critical tasks
- +Gap tracking reveals missed delegation opportunities
This is different from the broader multi-agent architecture post, which covers how to build and orchestrate a system of agents. That post is about structure. This post is about the decision mechanism - how the system knows, for any given message, whether to delegate at all and where to send it if so. The architecture and the router are complementary layers. It also connects to hook-based automation patterns - both use the same UserPromptSubmit event to intercept and process messages before Claude acts on them.
The full implementation - complete scoring tables with all point values, the 983-line Python hook, model routing configuration, and the gap tracking setup that logs missed delegation opportunities - is part of Claude Code Mastery.
Want the full system blueprint? Get the free 3-pattern guide.



