Step 7: Improve¶

Part of: Business-First AI Framework

The Problem¶

Without a structured improvement process, AI workflows follow one of two failure patterns:

Set and forget. The workflow was useful when you built it, but business context has shifted, new tools have launched, and the output quality has drifted. Nobody notices until someone complains — or worse, until flawed output makes it to a client.

Constant tinkering. Someone tweaks the prompt every time the output is not perfect, introducing regressions and making it impossible to tell whether the workflow is actually getting better or just different. The team never trusts the workflow enough to rely on it.

Improve teaches you when to revisit a running workflow, how to evaluate it systematically, and what to do with the findings.

When to Revisit¶

Not every workflow needs monthly check-ups. Watch for these quality signals — any one of them is reason to run an improvement cycle:

Signal	What it means
Increasing manual edits	Users are spending more time fixing output than they used to — quality may be drifting
Changed business context	Your products, audience, terminology, processes, or competitive landscape have shifted since the workflow was built
New tools available	Your platform has launched new features, MCP servers, or integrations that could make the workflow more capable
Steps being skipped	Users bypass certain steps because they are not adding value — the workflow may have unnecessary complexity
Complaints or errors	Someone reports that the output was wrong, off-brand, or missed something important
Scheduled review cadence	You set a review date during Run (Step 6) — it has arrived

Set a reminder during Run

When you operationalize a workflow in Step 6, set a calendar reminder for your first review. Monthly is a good default for high-frequency workflows. Quarterly works for workflows you run less often.

Regression Evaluation¶

Re-run the eval suite from Test (Step 5) using the same test scenarios and scoring dimensions. Then compare results to your recorded baseline.

What to look for¶

Finding	What it means
Scores are stable or improving	The workflow is holding up. No action needed unless you identified other quality signals.
Scores dropped on specific dimensions	Something has changed — context may be outdated, platform behavior may have shifted, or recent edits to the prompt introduced a regression.
Scores dropped across the board	A systemic issue. Check whether a platform update changed default behavior, a context file was removed, or a tool connection broke.
New scenarios produce poor results	The workflow works for the original test cases but not for new situations. The prompt or context may need to be expanded to cover additional cases.

Record the new scores alongside your baseline. This creates a quality history you can reference in future cycles.

Graduation Assessment¶

Over time, some workflows outgrow their orchestration mechanism. A prompt that started simple may have accumulated so many instructions that it is unwieldy. A skill-powered prompt may need to make decisions you cannot predict in advance. The right response is not to keep patching — it is to graduate the workflow to a more capable mechanism.

The Orchestration Ladder¶

Current mechanism	Graduate to	When to graduate
Prompt	Skill-Powered Prompt	Steps have become complex enough that you are repeating the same multi-step instructions across runs. Extracting those into reusable skills would make the prompt cleaner and the sub-steps more reliable.
Skill-Powered Prompt	Agent	The workflow needs to make sequencing decisions, use tools, or adapt its approach based on intermediate results — things a human following a fixed skill sequence cannot efficiently orchestrate.
Agent (single)	Agent (multi-agent)	The agent is handling too many distinct responsibilities. Splitting into specialized agents (researcher, writer, editor) with clear handoffs improves quality and makes each agent easier to maintain.

Graduation is not always the right answer. If the workflow works well at its current level, leave it. The goal is to match the mechanism to the workflow's actual needs — not to over-engineer.

Graduation means going back to Design

When you graduate a workflow to a new mechanism, you are changing the architecture. Return to Design (Step 3) to reassess the orchestration mechanism, update the Building Block Spec, then proceed through Build and Test with the new architecture.

For Organizations¶

If the workflow serves a team or business process, the improvement cycle includes an operationalization review:

Adoption — Is the team using the workflow? If adoption has dropped, find out why and address it.
Training — Are new team members being onboarded to the workflow? Update training materials if the workflow has changed.
Governance — Are the right people maintaining the workflow? Have edit permissions stayed appropriate?
ROI — Is the workflow still saving time or improving quality compared to the manual alternative? Quantify if possible.

Decision Framework¶

Every improvement cycle ends with one of four outcomes:

Outcome	What it means	Next step
No changes needed	Eval scores are stable, no quality signals, workflow fits its purpose	Record the result and set the next review date
Tune	Specific building blocks need adjustment — context is outdated, a prompt needs refinement, a tool connection needs updating	Go to Build (Step 4), fix the identified issues, then Test (Step 5)
Redesign	Architecture assumptions have changed — the workflow needs a different orchestration mechanism, new building blocks, or a fundamentally different approach	Go to Design (Step 3) and rework the Building Block Spec
Evolve	The workflow should graduate to a more capable orchestration mechanism (see Graduation Assessment)	Go to Design (Step 3) and upgrade the mechanism

The Improve step completes the lifecycle loop. Every outcome either confirms the workflow is healthy or sends you back to an earlier step with a specific target — never a vague "make it better."

What This Produces¶

An Improvement Plan saved to outputs/[workflow-name]-improvement-plan.md that captures:

Current eval scores compared to baseline
Quality signals that triggered the review
Findings from the regression evaluation
Graduation assessment (if applicable)
Decision outcome and rationale
Specific actions to take (which building blocks to fix, what context to update, etc.)
Next review date

How to Use This¶

This step is facilitated by the improve Business-First AI Framework skill. See Get the Skills for installation instructions across all supported platforms.

Start with this prompt:

Evaluate my running workflow and help me decide what to improve.

The skill reads your Building Block Spec and previous test results, guides you through the regression evaluation and graduation assessment, and produces the Improvement Plan.

Run — the step before Improve
Test — where the eval suite and baseline were established
Design — where to go for Redesign or Evolve outcomes
Build — where to go for Tune outcomes