Vision Builder

System Design

High-level architecture, Vision Brain Pipeline and delivery model for Vision Builder.

Current version

system-design-v0.8

Last updated

5/4/2026, 11:56:48 PM

Change type

architecture + evaluation + Brain quality

Summary

Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.

Executive Summary & Goals

Vision Builder helps product teams turn rough product thinking into a clear future-state vision, separated from positioning, product description, strategic bet and evidence gaps.

The core asset is the Vision Brain: a versioned product-strategy reasoning system that asks the right next question, avoids irrelevant rabbit holes and improves through repeatable evaluation in Brain Lab.

Workspaces

Workspace	Purpose
Guided Builder	User-facing guided flow
Vision Pinboard	Visual board showing raw strategy cards, summary and final vision
Brain Console	Single-case debug tool for prompts, payloads, raw responses and parsed outputs
Brain Lab	Automated evaluation workspace for testing and improving the Vision Brain
System Design	Architecture, pipeline and delivery view

Vision Brain Components

These names are internal architecture labels for Brain Console, Brain Lab, traces and System Design. They should not over-theme the user-facing Guided Builder.

Formal component	Internal name	Responsibility
Strategy Extractor	Miss Moneypenny	Extracts strategy intelligence from messy product input
Vision Shaper	Supernanny	Sharpens raw or product-like vision candidates into future-state ambitions
Follow-up Question Generator	Louis Theroux	Asks the single most useful next question
Readiness Controller	The Rock	Controls summary/output readiness gates
Mode & Safety Rules	Arnold J. Rimmer	Blocks premature pricing, API, technical and mode-breaking rabbit holes
Output Generator	Homer Simpson	Produces final vision, positioning, product description, strategic bet and evidence
Evaluation Judge	Judge Judy	Scores Brain quality, detects failures and explains what needs improving
Trace Recorder	C-3PO	Records prompts, payloads, responses, adjustments, usage and versions
Case & Report Store	Gollum	Stores golden cases, eval reports, traces and precious examples
System Design Versioning	Doc Brown	Archives current and historical system design versions
Learning Processor	Gordon Ramsay	Filters user feedback and decides what is good enough to become learning data

Unified Architecture Diagram

User

Frontend Workspaces

Builder, Pinboard, Console, Lab, Design

API Routes

Brain and app endpoints

Vision Brain Pipeline

Triage, extraction, gates, outputs

Model Provider

OpenAI

Persistence Layer

localStorage, JSON, reports

Brain Lab Evaluation

Golden cases and judge

Reports / Traces / Saved Visions

Vision Brain Pipeline

1. Raw Intake
Capture messy natural-language product thinking.
2. Input Triage
Classify serious, vague, playful, nonsense, malicious or sensitive input.
3. Strategy Extraction
Separate stakeholders, pain, alternatives, future state, strategy and evidence.
4. Guardrails
Apply deterministic readiness, unknown classification and quality rules.
5. Follow-up Question
Ask the single most useful question for a stronger vision.
6. Strategy Summary
Create "Here is what I think you are building" before final outputs.
7. Final Outputs
Separate vision, positioning, product description, strategic bet and evidence needed.
8. Evaluation
Run golden cases in Brain Lab and score regressions.
9. Trace
Store prompt, payload, raw response, parsed response, adjustments, usage and versions.

Data & Learning Flow

User answers

Brain traces

User edits / accepts / rejects

Learning eligibility

Candidate learning examples

Reviewed golden examples

Brain Lab regression testing

New Brain version

Technology Stack

Frontend	Next.js / React
Styling	Tailwind CSS
Brain API	Next.js API routes / server functions
AI Models	OpenAI models
Prototype Storage	local JSON / file reports
Future Storage	PostgreSQL or Supabase
Eval Reports	JSON + Markdown
Deployment	Vercel

CI/CD & Delivery Pipeline

GitHub

TypeScript checks

Tests / Brain eval smoke run

Build

Preview deploy

Manual review

Production deploy

Key Constraints & Quality Attributes

Traceability
Versioning
Safety / input triage
Quality / regression testing
Cost control
Privacy / learning eligibility
Reliability
Extensibility

System Design Version History

Archive a new System Design version whenever the architecture, Brain pipeline, evaluation harness, named components, data schema, reporting/copy behaviour or major UI structure changes.

Word export not wired yet. Use Export Markdown for now. Google Docs export requires integration setup.

Version	Date	Change type	Summary	Affected areas
system-design-v0.8	5/4/2026, 11:56:48 PM	architecture + evaluation + Brain quality	Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.	Brain Console, Brain Lab, Judge Judy, Homer Simpson, Supernanny, Miss Moneypenny, Arnold J. Rimmer, Gordon Ramsay, Louis Theroux, C-3PO, Gollum, Doc Brown, System Design, Vision Brain Pipeline, Changelog
system-design-v0.7	5/5/2026, 12:30:00 AM	eval	Adds broader Judge Judy eval cases, including real-world benchmark-inspired cases, to test vision quality, stakeholder separation, rabbit-hole avoidance, evidence specificity, input triage, overfitting, reference-direction matching and clean-pass behaviour.	Brain Lab, Judge Judy, Gollum, C-3PO, System Design, Changelog
system-design-v0.6	5/5/2026, 12:00:00 AM	architecture	Adds canonical internal names for Vision Brain components, Judge Judy/C-3PO/Gollum/Doc Brown references, changelog policy and systemDesignVersion metadata.	System Design, Brain Lab, Brain Console, Vision Brain Pipeline, Brain Eval Reports, Changelog
system-design-v0.2	5/4/2026, 10:05:00 PM	architecture	Adds the System Design workspace, file-backed design version history and explicit documentation for the Vision Brain architecture, data flow, learning loop and delivery model.	Guided Builder, Vision Pinboard, Brain Console, Brain Lab, System Design, Vision Brain Pipeline
system-design-v0.1	5/4/2026, 9:57:59 PM	architecture	Establishes the Vision Brain as a versioned product-strategy reasoning system with separate workspaces for user workflow, debug tracing and automated evaluation.	Guided Builder, Vision Pinboard, Brain Console, Brain Lab, Vision Brain Pipeline

Current version

System Design v0.8

Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.

Version: system-design-v0.8
Title: System Design v0.8
Created: 2026-05-04T23:56:48.217Z
Change type: architecture + evaluation + Brain quality
Author/source: Codex
Summary: Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.
Affected areas: Brain Console, Brain Lab, Judge Judy, Homer Simpson, Supernanny, Miss Moneypenny, Arnold J. Rimmer, Gordon Ramsay, Louis Theroux, C-3PO, Gollum, Doc Brown, System Design, Vision Brain Pipeline, Changelog

# System Design v0.8

Created: 2026-05-05T00:56:48.217+01:00

Change type: architecture + evaluation + Brain quality

Summary: improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.

## Executive Summary & Goals

Vision Builder helps product teams turn rough product thinking into a clear future-state vision, separated from positioning, product description, strategic bet and evidence gaps.

## Workspaces

| Workspace | Purpose |
| --- | --- |
| Guided Builder | User-facing guided flow |
| Vision Pinboard | Visual board showing raw strategy cards, summary and final vision |
| Brain Console | Single-case debug tool for prompts, payloads, raw responses and parsed outputs |
| Brain Lab | Automated evaluation workspace for testing and improving the Vision Brain |
| System Design | Architecture, pipeline and delivery view |

## Vision Brain Components

These names are internal architecture labels for Brain Console, Brain Lab, traces and System Design. They should not over-theme the user-facing Guided Builder.

| Formal component | Internal name | Responsibility |
| --- | --- | --- |
| Strategy Extractor | Miss Moneypenny | Extracts strategy intelligence from messy product input |
| Vision Shaper | Supernanny | Sharpens raw or product-like vision candidates into future-state ambitions |
| Follow-up Question Generator | Louis Theroux | Asks the single most useful next question |
| Readiness Controller | The Rock | Controls summary/output readiness gates |
| Mode & Safety Rules | Arnold J. Rimmer | Blocks premature pricing, API, technical and mode-breaking rabbit holes |
| Output Generator | Homer Simpson | Produces final vision, positioning, product description, strategic bet and evidence |
| Evaluation Judge | Judge Judy | Scores Brain quality, detects failures and explains what needs improving |
| Trace Recorder | C-3PO | Records prompts, payloads, responses, adjustments, usage and versions |
| Case & Report Store | Gollum | Stores golden cases, eval reports, traces and precious examples |
| System Design Versioning | Doc Brown | Archives current and historical system design versions |
| Learning Processor | Gordon Ramsay | Filters user feedback and decides what is good enough to become learning data |

## Component Attribution Policy

Trace adjustments should be attributed by responsibility, not by whichever stage happened to emit them.

- Miss Moneypenny owns raw strategy extraction: product category, users, problems, alternatives, strategic bet and evidence.
- Supernanny owns vision shaping: `rawVisionCandidate`, `sharpenedVisionCandidate` and promoted `visionCandidate`.
- Louis Theroux owns follow-up wording: `recommendedNextQuestion`, follow-up questions and future-state confirmation prompts.
- The Rock owns gates and stop conditions: `readyForSummary`, `readyForOutputs`, `needsFollowUpBeforeVision`, output readiness and stopping when outputs are ready.
- Arnold J. Rimmer owns mode safety: later-mode unknowns, skipped technical/commercial questions and pricing/API/implementation rabbit-hole blocks.
- Homer Simpson owns generated outputs: final vision, positioning, product description, strategic bet and generated evidence.
- Judge Judy owns scoring, failures, verdicts, score explanations and quality notes.
- C-3PO owns trace recording, copied trace payloads, prompt/payload/raw/parsed records, usage and version metadata.
- Gollum owns golden cases, eval reports, report loading and storage.
- Doc Brown owns `systemDesignVersion`, snapshots and archived design versions.
- Gordon Ramsay owns learning eligibility and candidate learning data.

When a target ingredient is cleared because outputs are already ready, attribute it to The Rock. When it is cleared because the question belongs to pricing, API, implementation or another later mode, attribute it to Arnold J. Rimmer.

## Unified Architecture

User -> Frontend Workspaces -> API Routes -> Vision Brain Pipeline -> Model Provider -> Persistence Layer -> Brain Lab Evaluation -> Reports / Traces / Saved Visions

The frontend workspaces share navigation but remain separate surfaces. Brain Console and Brain Lab expose the reasoning system for debugging and regression testing while the Guided Builder and Vision Pinboard stay oriented around product workflow.

## Vision Brain Pipeline

1. Raw Intake: Capture messy natural-language product thinking.
2. Input Triage: Arnold J. Rimmer classifies serious, vague, playful, nonsense, malicious or sensitive input before extraction.
3. Strategy Extraction: Miss Moneypenny separates stakeholders, pain, alternatives, future state, strategic bet and evidence needs.
4. Guardrails: Supernanny, The Rock and Arnold J. Rimmer apply deterministic vision shaping, readiness gates and mode rules.
5. Follow-up Question: Louis Theroux asks the single most useful next question for a stronger vision.
6. Strategy Summary: Create "Here's what I think you are building" before final outputs.
7. Final Outputs: Homer Simpson separates vision, positioning, product description, strategic bet and evidence needed.
8. Evaluation: Judge Judy runs golden cases in Brain Lab and scores regressions.
9. Trace: C-3PO stores prompt, payload, raw response, parsed response, system adjustments, usage and version metadata.

## Data & Learning Flow

User answers -> Brain traces -> user edits/accepts/rejects -> learning eligibility -> candidate learning examples -> reviewed golden examples -> Brain Lab regression testing -> new Brain version

The Brain does not learn directly from every user input. Triage and learning eligibility decide whether an interaction can become a candidate example. Gordon Ramsay is the planned learning processor that filters corrections before they become candidate learning examples, and reviewed examples can be promoted to Gollum's golden dataset.

## Technology Stack

| Area | Current choice |
| --- | --- |
| Frontend | Next.js / React |
| Styling | Tailwind CSS |
| Brain API | Next.js API routes / server functions |
| AI Models | OpenAI models |
| Prototype Storage | localStorage, local JSON and file reports |
| Future Storage | PostgreSQL or Supabase |
| Eval Reports | JSON + Markdown |
| Deployment | Vercel |

## CI/CD & Delivery Pipeline

GitHub -> TypeScript checks -> Tests / Brain eval smoke run -> Build -> Preview deploy -> Manual review -> Production deploy

## Key Constraints & Quality Attributes

- Traceability: Brain decisions must preserve prompt, payload, raw response, parsed response, system adjustments and C-3PO component metadata.
- Versioning: Brain, prompt, schema, judge and `systemDesignVersion` metadata must be explicit in traces and eval reports.
- Safety / input triage: Prompt injection, nonsense and sensitive input need different treatment before learning or extraction.
- Quality / regression testing: Brain Lab should catch vision, stakeholder, readiness and rabbit-hole regressions.
- Cost control: Cheap models can support coaching, but heavy generation should stay deliberate and traceable.
- Privacy / learning eligibility: User corrections are valuable but should be reviewed before becoming learning examples.
- Reliability: Basic routes, saved sessions, traces and reports should keep working when Brain internals evolve.
- Extensibility: New workspaces and Brain stages should be added without merging unrelated UX surfaces.

## Brain Lab Eval Coverage

Brain Lab is now expected to run multiple fixture groups through Gollum's case store:

- `golden`: broad industry fixtures that protect core Vision Brain behaviours.
- `judge-judy`: targeted adversarial cases for Judge Judy failure modes such as product-like visions, positioning disguised as vision, weak ambition, poor stakeholder separation, generic evidence, prompt injection and overfitting.
- `real-world-benchmarks`: public-reference inspired fixtures that compare messy product input with the strategic direction of well-known product visions without hard-coding those public examples into production Brain logic.

Judge Judy reports include `passType`, `stageScores`, `correctionsMade`, top failure clusters and failure-stage rollups. A case can pass end to end while still being marked `pass_with_corrections` when Supernanny, The Rock, Arnold J. Rimmer or Homer Simpson had to correct weak upstream material.

## Brain Improvement Pass v0.8

The v0.8 Brain pass targets the recurring Brain Lab failures in the 20-case report:

- Homer Simpson now treats evidence as observable proof points rather than abstract nouns, synthesises positioning from target, current alternative, category and differentiated value, rewrites weak strategic bets as market or behaviour beliefs, and runs a post-adjustment output quality guard so repair templates cannot leak into final outputs.
- Supernanny now prevents malformed `. becomes normal` grammar, only uses `X becomes normal` when `X` is a noun phrase, corrects product-like or weak vision candidates, and blocks benchmark overfit where the case context does not fit.
- Miss Moneypenny now separates marketplace demand-side users from supply-side participants when input says a product connects or matches two sides, and separates healthcare users, buyers and beneficiaries.
- Arnold J. Rimmer and Gordon Ramsay now classify prompt-injection, playful and nonsense input as non-serious, block serious output generation, and mark it ineligible for learning.
- Louis Theroux now uses the strongest sharpened future-state candidate in follow-up questions and avoids confirming product-object candidates.
- Judge Judy now detects malformed candidates, weak follow-up candidates, product descriptions that are too vague, output template leaks, overfit benchmark language and prompt-injection learning mistakes.

Brain Lab should continue to run 20 cases by default. The v0.8 success criteria are `overallScore >= 95`, `visionQuality >= 90`, `evidenceSpecificity >= 90`, `outputSeparation >= 90`, `readinessGateCorrectness >= 95`, `avoidsRabbitHoles = 100`, no high-severity failures, no malformed `. becomes normal` outputs, no final vision starting with `For`, and prompt-injection fixtures showing `inputQuality` not serious, `shouldUseForLearning: false` and `promptInjectionBlocked: true`.

## Full And Compact Eval Reports

Brain Lab now keeps full reports and produces compact review artifacts for human review and ChatGPT/Codex paste workflows.

- Full reports preserve every case, trace, expected behaviour, adjustment and Judge Judy object.
- Compact JSON reports preserve run metadata, scores, top failures, top failure stages, failed or sub-96 cases, high-severity failures, relevant extracted fields, outputs, selected output adjustments and recommended Brain improvements.
- Compact reports scan all cases for banned output patterns, group them by output field and mark when a pattern was introduced by `outputAdjustments.adjustedValue`.
- Dashboard Markdown reports summarise the same compact data for fast review.
- `npm run brain:eval` produces full JSON, full Markdown, compact JSON and compact dashboard Markdown by default.
- `npm run brain:eval:compact -- <full-report.json>` can regenerate compact artifacts from an existing full report.

## System Design Update Policy

Create a new System Design snapshot whenever there is a meaningful change to:

- Vision Brain Pipeline
- Judge Judy scoring
- Brain Lab evaluation harness
- named Brain components
- workspace structure
- persistence model
- learning loop
- data schema
- copy/export/reporting behaviour
- deployment or CI/CD process

Use `npm run design:snapshot` to create the next archived version. Doc Brown owns the snapshot model: old versions must remain fully reviewable and must not be overwritten.

## System Design Version History

System Design snapshots are stored under `docs/system-design/`. Current and archived versions are shown on `/design`, and copy/export actions produce clean Markdown.

Current version: `system-design-v0.8`

Previous versions: `system-design-v0.7`, `system-design-v0.6`, `system-design-v0.2`, `system-design-v0.1`