Claude Code Planning Framework: UPF + DSV Reasoning in One System

Listen to this article (10 min)

UPF + DSV Presentation (15 slides)

1 / 15

The Plan Was Fine. The Question Was Wrong.

I asked an AI system: "Does the NHR tax regime apply to freelancers moving to Italy?"

It delivered a structured, confident answer. Eligibility rules. Residency requirements. Tax rates. Clean format, professional tone, completely usable output.

Also completely useless - because NHR was replaced by the IFICI regime in 2024. The premise of my question no longer existed. The AI did not give a wrong answer. It answered the wrong question.

This is the failure mode that I kept running into, and it is not a model intelligence problem. It is an architecture problem. Claude Code plans confidently. It reasons fluently. But without a structured way to question its own framing, it optimizes toward the first interpretation it forms - whether or not that interpretation is correct.

I spent months dealing with variations of this: plans that looked solid but were built on unverified assumptions, reasoning chains where one wrong premise poisoned every step downstream, implementation failures that traced back to a question nobody had challenged.

The fix was not better prompts. It was a reasoning framework that forces the right questions before the work begins.

This post covers two interlocking systems I now use on every non-trivial project: UPF (Universal Planning Framework) and DSV (Decompose-Suspend-Validate). DSV is the theoretical foundation. UPF is the structural implementation. Together they catch problems that neither catches alone.

Both are open source on GitHub.

Why Plans Break: Premature Collapse

Before the solution, the problem. Most Claude Code planning failures trace to one root cause: Premature Collapse.

Think of quantum mechanics. Before you measure a particle, it exists in multiple possible states simultaneously. The moment you measure, all other states collapse and disappear. AI reasoning does something similar - it "measures" (picks an interpretation) before exploring the full space of what a question could mean.

Ask Claude to plan a migration project and it will produce a plan. A plausible, detailed plan. What it will not do without prompting is ask: "Is migration actually the right approach here? Could the underlying problem be solved differently? Are the constraints I assumed actually fixed?"

The first interpretation feels natural. It locks in. Everything downstream builds on it. And if that first interpretation was wrong, confident execution just gets you to the wrong destination faster.

I call outputs where validation reveals the question itself needs to change MUTATED claims. The NHR example is a clean one: after decomposing the reasoning, SUSPEND revealed that the regime name might not be current, and CHECK confirmed the question was structurally outdated. No amount of better reasoning about NHR fixes that. Only stepping back and questioning the frame does.

UPF gives that stepping-back a structure. DSV gives it a foundation.

DSV: The Theoretical Foundation

DSV stands for Decompose-Suspend-Validate. It is a reasoning architecture that prevents Premature Collapse by forcing you to hold multiple interpretations before committing to one.

It works in three moves:

DECOMPOSE - Break the reasoning into its component claims. Every complex answer contains multiple assertions. A plan that "seems reasonable" might have 4 distinct claims underneath it, and usually only one of them is wrong. You cannot fix what you cannot see.

SUSPEND - For each claim, map what else it could mean before validating any of it. This is the step that matters most and gets skipped most often. Do not check whether claim A is correct yet. Ask: what are the alternative interpretations of claim A? Hold them simultaneously. Do not collapse to one reading prematurely.

VALIDATE - Now check. Which claim is weakest? Does doubt about that claim change the conclusion? If validation reveals a MUTATED claim - one where the question itself needs to change - stop and reframe before continuing.

DSV comes in four tiers depending on stakes and complexity:

dsv-tiers

Tier 0: Quick DSV - 30 seconds, 3 questions, zero infrastructure

Tier 1: Abbreviated DSV - skip SUSPEND, ~4x token overhead

Tier 2: Full DSV - all steps with claim dependency graph, 5-12x overhead

Tier 3: Ensemble DSV - multiple independent decompositions, 15-20x overhead

For most situations, Tier 0 is enough. Quick DSV is 30 seconds, requires nothing, and catches the majority of framing failures.

Quick DSV: The 30-Second Standalone Check

When Claude Code gives you output that feels complete but something seems off, run this before acting on it:

1. DECOMPOSE (10 seconds) - What are the 2-3 key claims in this reasoning?

2. SUSPEND (10 seconds) - For each claim: what is an alternative interpretation I have not considered?

3. CHECK (10 seconds) - Which claim am I least sure about? Does doubt about it change the conclusion?

If CHECK reveals doubt, dig deeper. If it confirms, proceed with higher confidence.

This works standalone on any Claude Code output - not just planning. Debugging hypotheses, code review assumptions, content framing, feature request interpretation. Premature Collapse shows up everywhere.

The NHR example in full:

DECOMPOSE: Two claims. (1) NHR is the relevant tax regime. (2) Freelancers qualify under it.
SUSPEND: Could NHR have been replaced? Could "freelancer" mean different things across jurisdictions?
CHECK: Claim 1 is weakest. And indeed, NHR was replaced by IFICI in 2024. MUTATED claim - the question itself was wrong.

Thirty seconds. Caught before I built anything on a false premise.

UPF: DSV Made Structural

Quick DSV is a thinking tool. UPF is what happens when you embed that thinking into a planning process that runs automatically.

The Universal Planning Framework is built on 117 real plans and 195 handoffs. Version 1.0 was a rough draft. V1.1 added adversarial hardening. V1.2 expanded to 8 domains, added 21 anti-patterns, and formalized DSV as the theoretical backbone.

It runs in four stages:

flow

DSV Maps Directly to UPF Stages

This is not a coincidence - I designed UPF around DSV once I understood the reasoning failure pattern:

DECOMPOSE = Stage 0, Checks 0.1-0.6

The first six discovery checks break the project into its component parts: scope, stakeholders, constraints, dependencies, existing work, and feasibility. This is forced decomposition before any planning begins.

SUSPEND = Stage 0, Checks 0.7-0.12

Checks 0.7 through 0.12 explicitly hold alternatives open. Check 0.9 is the AHA Effect - the single most valuable check in the entire framework. It asks: "Is there a fundamentally better approach that makes this plan unnecessary?" This is SUSPEND applied to the project itself.

VALIDATE = Stage 1 Assumption Section

Every plan produced in Stage 1 includes an explicit assumptions list with validation methods and failure conditions. What are you betting on being true? How would you know if you were wrong? What do you do if the assumption fails? This is DSV VALIDATE built into the plan structure.

Stage 0: Discovery - 12 Checks Before You Write Anything

Stage 0 runs before the plan exists. Its job is to surface what you do not know you do not know.

Three checks run on every project regardless of domain:

Existing Work Audit - What already exists that solves 70%+ of this? I have stopped counting how many hours I saved by finding that the codebase already contained what I was about to build from scratch.

Feasibility Check - Is this possible as described? Scope constraints, API limitations, timeline math. Catches impossible plans before you write them.

The AHA Effect - Is there a fundamentally better approach that makes this entire plan unnecessary? This is SUSPEND at the project level. A custom CMS when Strapi exists. Fifty blog posts when five deep pillars would outperform them. Challenging your first framing before it locks in.

The remaining nine checks activate based on domain and complexity. Risk surfaces, stakeholder mapping, constraint cataloging, dependency trees, rollback viability. The framework detects which ones apply.

The 8 domains in v1.2: Software, AI/Agent, Business, Content, Infrastructure, Data and Analytics, Research, and Multi-Domain. Each gets different conditional checks in Stage 0 and different conditional sections in Stage 1.

Stage 1: Plan - 5 Core + 18 Conditional Sections

After discovery, the plan. Five sections are mandatory regardless of domain:

Context and Why - The problem in 3 sentences. Not "improve X" but why X matters and what changes if it is fixed.

Success Criteria - What does DONE look like? Crucially, also: what does FAILED look like? If you cannot write a failure condition, your criteria are not specific enough.

Assumptions - Everything the plan bets on being true, with a validation method and a contingency for each. This is where DSV VALIDATE lives structurally. Every assumption is a potential MUTATED claim waiting to surface.

Phases - Work in 3-4 hour chunks with binary gates. Pass or fail, nothing in between. No phase should produce output that is "mostly done."

Verification - Automated and manual checks. If neither exists, you are shipping blind.

Then 18 conditional sections activate based on domain. An AI/Agent project gets agent interaction protocols, model cost analysis, and delegation strategy. An Infrastructure project gets rollback planning, monitoring requirements, and blast radius estimation. The framework detects the domain and routes to the right template.

Before the plan moves to hardening, it passes through FAILED Conditions - explicit criteria for killing the project entirely. Not every idea deserves implementation. Stage 1 forces a decision point.

Stage 1.5: Harden - Six Perspectives That Exist to Break Your Plan

Stage 1.5 is adversarial stress-testing. The planner agent runs six distinct critical perspectives on the plan in sequence. This is not polish or review. It is structured pressure designed to find what will break before implementation begins.

plan-hardening-pipeline

Outside Observer - goal clarity, ambiguous metrics, unstated assumptions

Pessimistic Risk Assessor - single failure points, missing timeouts, cascade risks

Pedantic Lawyer - vague gates, undefined contracts between components

Skeptical Implementer - first blocker, cold start problem, missing prerequisites

The Manager - scope realism, timeline math, resource constraints

Devil's Advocate - core assumption validity, fundamental viability

Each perspective has its own lens and its own anti-pattern to catch. The Outside Observer checks whether a newcomer could parse the goal in 15 words. The Pessimistic Risk Assessor hunts for single failure points and cascade risks. The Pedantic Lawyer enforces contract rigor - no vague gates, no "looks good" exit criteria. The Skeptical Implementer asks "what blocks me at 9am tomorrow?". The Manager challenges scope realism and timeline math. The Devil's Advocate questions whether the core assumptions even hold.

Each perspective feeds into the next. By the end, the plan has survived six adversarial passes or it has not survived. Either outcome is valuable. A plan that breaks in Stage 1.5 saves you from a worse break at Step 47 of execution.

This is DSV SUSPEND made automated - holding multiple critical perspectives simultaneously before collapsing to "this plan is ready."

Stage 2: Meta - 7 Final Checks and 21 Anti-Patterns

The hardened plan gets seven final validation checks and a scan against 21 cataloged anti-patterns.

The anti-patterns come in three groups:

12 Core Anti-Patterns - Vague success criteria, skipping Stage 0, assumptions treated as facts, no failure conditions, zombie projects with no kill switch, scope that assumes infinite parallelism.

5 AI-Specific Anti-Patterns - Delegation without verification, context window assumptions, model capability mismatch, missing fallback for agent failures, cost math that ignores token accumulation.

4 Quality Anti-Patterns - No rollback path, verification that only checks happy paths, missing dependency declarations, phase gates that cannot be evaluated objectively.

If a plan clears all four stages - 12 discovery checks, 5 core plan sections with assumption validation, 6 adversarial agents, and 7 meta checks plus 21 anti-patterns - it is ready for implementation. Not probably ready. Structurally ready.

Before and After: What Changes

Without UPF + DSV

-Plan from first interpretation, skip alternatives
-Discover critical gaps during execution at step 30+
-Assumptions treated as facts until they break
-Same planning template regardless of project type
-No structured adversarial pressure before building

With UPF + DSV

+Quick DSV on any output before acting on it
+12 discovery checks surface gaps before planning starts
+Assumptions listed with validation methods and contingencies
+8 domains with tailored conditional sections
+6 adversarial agents stress-test every plan before execution

The numbers from my own projects: before UPF, roughly 40% of plans needed significant replanning mid-execution. After: under 10%. The plans that do need changes hit a built-in replanning trigger instead of a mid-project crisis.

The bigger shift was cognitive. Knowing that Stage 0 will challenge my first approach, I stopped treating my initial framing as correct by default. SUSPEND is now a habit, not a deliberate step.

Where to Start

If you take nothing else from this post, take Quick DSV. Three questions, 30 seconds, no setup required. Run it on any Claude Code output where you feel uncertain. The structure forces you to name which claim is weakest, and that alone changes what you do next.

If you want the full system, UPF is open source on GitHub. All stages, agent prompts, domain templates, and configuration files. Install takes about 30 seconds.

This gives you the planning rule, four slash commands (/plan-new, /interview-plan, /plan-review, /plan-refine), and the planner agent. Works with any project type.

UPF integrates naturally with the rest of a Claude Code system. I use it alongside my memory system for tracking plan state across sessions and hooks automation for triggering Stage 0 automatically on new projects. The context management patterns keep planning sessions from ballooning when discovery runs long.

Evolving Lite and the Universal Planning Framework cover the parts that need custom configuration: scoring tables, prompt templates, and config files for custom domain extensions. All free and open source.

This lives in primeline-ai/universal-planning-framework - the planning framework I use. Free, MIT, no build step.

FAQ

What is the Claude Code planning framework UPF and how does DSV fit in?+

UPF (Universal Planning Framework) is a 4-stage planning methodology - Discovery (12 checks), Plan (5 core + 18 conditional sections), Harden (6 adversarial agents), and Meta (7 checks + 21 anti-patterns). DSV (Decompose-Suspend-Validate) is the reasoning architecture that UPF is built on: Stage 0 checks 0.1-0.6 implement DECOMPOSE, checks 0.7-0.12 implement SUSPEND (including the AHA Effect), and Stage 1 assumption validation implements VALIDATE. Together they prevent Premature Collapse - where an AI commits to the first interpretation before exploring alternatives.

How does Quick DSV work and when should I use it?+

Quick DSV is a 30-second reasoning check: DECOMPOSE (what are the 2-3 key claims?), SUSPEND (what alternative interpretations exist for each?), CHECK (which claim am I least sure about - does doubt change the conclusion?). Use it any time Claude Code produces output that feels complete but something seems off. It catches MUTATED claims - cases where the question itself needs to change, not just the answer. No infrastructure required, works on any AI output.

What are the 6 adversarial agents in UPF Stage 1.5 and what do they check?+

Six agents review plans sequentially, each with a distinct critical perspective: Outside Observer (goal clarity, ambiguous metrics), Risk Assessor (single failure points, cascade risks), Pedantic Lawyer (vague gates, undefined component contracts), Skeptical Implementer (first blocker, cold start problem), The Manager (scope realism, timeline math), and Devil's Advocate running on Opus (core assumption validity). Each passes findings to the next, creating a chain of adversarial pressure before any implementation begins.

What is Premature Collapse and how do UPF and DSV prevent it?+

Premature Collapse is when an AI (or a person) locks onto the first interpretation of a question and validates that framing instead of exploring alternatives. Like quantum measurement collapsing a wave function, the system commits before mapping the possibility space. DSV prevents it through the SUSPEND step - explicitly holding multiple interpretations before validating any. UPF structures this into a planning process: Stage 0 forces alternative exploration before planning, Stage 1 requires explicit assumption listing, and Stage 1.5 uses six adversarial agents to challenge what the plan takes for granted.