Vision Builder
System Design
High-level architecture, Vision Brain Pipeline and delivery model for Vision Builder.
Current version
system-design-v0.8
Last updated
5/4/2026, 11:56:48 PM
Change type
architecture + evaluation + Brain quality
Summary
Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.
Executive Summary & Goals
Vision Builder helps product teams turn rough product thinking into a clear future-state vision, separated from positioning, product description, strategic bet and evidence gaps.
The core asset is the Vision Brain: a versioned product-strategy reasoning system that asks the right next question, avoids irrelevant rabbit holes and improves through repeatable evaluation in Brain Lab.
Workspaces
| Workspace | Purpose |
|---|---|
| Guided Builder | User-facing guided flow |
| Vision Pinboard | Visual board showing raw strategy cards, summary and final vision |
| Brain Console | Single-case debug tool for prompts, payloads, raw responses and parsed outputs |
| Brain Lab | Automated evaluation workspace for testing and improving the Vision Brain |
| System Design | Architecture, pipeline and delivery view |
Vision Brain Components
These names are internal architecture labels for Brain Console, Brain Lab, traces and System Design. They should not over-theme the user-facing Guided Builder.
| Formal component | Internal name | Responsibility |
|---|---|---|
| Strategy Extractor | Miss Moneypenny | Extracts strategy intelligence from messy product input |
| Vision Shaper | Supernanny | Sharpens raw or product-like vision candidates into future-state ambitions |
| Follow-up Question Generator | Louis Theroux | Asks the single most useful next question |
| Readiness Controller | The Rock | Controls summary/output readiness gates |
| Mode & Safety Rules | Arnold J. Rimmer | Blocks premature pricing, API, technical and mode-breaking rabbit holes |
| Output Generator | Homer Simpson | Produces final vision, positioning, product description, strategic bet and evidence |
| Evaluation Judge | Judge Judy | Scores Brain quality, detects failures and explains what needs improving |
| Trace Recorder | C-3PO | Records prompts, payloads, responses, adjustments, usage and versions |
| Case & Report Store | Gollum | Stores golden cases, eval reports, traces and precious examples |
| System Design Versioning | Doc Brown | Archives current and historical system design versions |
| Learning Processor | Gordon Ramsay | Filters user feedback and decides what is good enough to become learning data |
Unified Architecture Diagram
User
Frontend Workspaces
Builder, Pinboard, Console, Lab, Design
API Routes
Brain and app endpoints
Vision Brain Pipeline
Triage, extraction, gates, outputs
Model Provider
OpenAI
Persistence Layer
localStorage, JSON, reports
Brain Lab Evaluation
Golden cases and judge
Reports / Traces / Saved Visions
Vision Brain Pipeline
1. Raw Intake
Capture messy natural-language product thinking.
2. Input Triage
Classify serious, vague, playful, nonsense, malicious or sensitive input.
3. Strategy Extraction
Separate stakeholders, pain, alternatives, future state, strategy and evidence.
4. Guardrails
Apply deterministic readiness, unknown classification and quality rules.
5. Follow-up Question
Ask the single most useful question for a stronger vision.
6. Strategy Summary
Create "Here is what I think you are building" before final outputs.
7. Final Outputs
Separate vision, positioning, product description, strategic bet and evidence needed.
8. Evaluation
Run golden cases in Brain Lab and score regressions.
9. Trace
Store prompt, payload, raw response, parsed response, adjustments, usage and versions.
Data & Learning Flow
User answers
Brain traces
User edits / accepts / rejects
Learning eligibility
Candidate learning examples
Reviewed golden examples
Brain Lab regression testing
New Brain version
Technology Stack
| Frontend | Next.js / React |
| Styling | Tailwind CSS |
| Brain API | Next.js API routes / server functions |
| AI Models | OpenAI models |
| Prototype Storage | local JSON / file reports |
| Future Storage | PostgreSQL or Supabase |
| Eval Reports | JSON + Markdown |
| Deployment | Vercel |
CI/CD & Delivery Pipeline
GitHub
TypeScript checks
Tests / Brain eval smoke run
Build
Preview deploy
Manual review
Production deploy
Key Constraints & Quality Attributes
- Traceability
- Versioning
- Safety / input triage
- Quality / regression testing
- Cost control
- Privacy / learning eligibility
- Reliability
- Extensibility
System Design Version History
Archive a new System Design version whenever the architecture, Brain pipeline, evaluation harness, named components, data schema, reporting/copy behaviour or major UI structure changes.
Word export not wired yet. Use Export Markdown for now. Google Docs export requires integration setup.
| Version | Date | Change type | Summary | Affected areas | Actions |
|---|---|---|---|---|---|
| system-design-v0.8 | 5/4/2026, 11:56:48 PM | architecture + evaluation + Brain quality | Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting. | Brain Console, Brain Lab, Judge Judy, Homer Simpson, Supernanny, Miss Moneypenny, Arnold J. Rimmer, Gordon Ramsay, Louis Theroux, C-3PO, Gollum, Doc Brown, System Design, Vision Brain Pipeline, Changelog | |
| system-design-v0.7 | 5/5/2026, 12:30:00 AM | eval | Adds broader Judge Judy eval cases, including real-world benchmark-inspired cases, to test vision quality, stakeholder separation, rabbit-hole avoidance, evidence specificity, input triage, overfitting, reference-direction matching and clean-pass behaviour. | Brain Lab, Judge Judy, Gollum, C-3PO, System Design, Changelog | |
| system-design-v0.6 | 5/5/2026, 12:00:00 AM | architecture | Adds canonical internal names for Vision Brain components, Judge Judy/C-3PO/Gollum/Doc Brown references, changelog policy and systemDesignVersion metadata. | System Design, Brain Lab, Brain Console, Vision Brain Pipeline, Brain Eval Reports, Changelog | |
| system-design-v0.2 | 5/4/2026, 10:05:00 PM | architecture | Adds the System Design workspace, file-backed design version history and explicit documentation for the Vision Brain architecture, data flow, learning loop and delivery model. | Guided Builder, Vision Pinboard, Brain Console, Brain Lab, System Design, Vision Brain Pipeline | |
| system-design-v0.1 | 5/4/2026, 9:57:59 PM | architecture | Establishes the Vision Brain as a versioned product-strategy reasoning system with separate workspaces for user workflow, debug tracing and automated evaluation. | Guided Builder, Vision Pinboard, Brain Console, Brain Lab, Vision Brain Pipeline |
Current version
System Design v0.8
Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting.
Version: system-design-v0.8 Title: System Design v0.8 Created: 2026-05-04T23:56:48.217Z Change type: architecture + evaluation + Brain quality Author/source: Codex Summary: Improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting. Affected areas: Brain Console, Brain Lab, Judge Judy, Homer Simpson, Supernanny, Miss Moneypenny, Arnold J. Rimmer, Gordon Ramsay, Louis Theroux, C-3PO, Gollum, Doc Brown, System Design, Vision Brain Pipeline, Changelog # System Design v0.8 Created: 2026-05-05T00:56:48.217+01:00 Change type: architecture + evaluation + Brain quality Summary: improves Brain component responsibilities, Judge Judy failure detection, named trace attribution and output quality rules while integrating compact eval reporting. ## Executive Summary & Goals Vision Builder helps product teams turn rough product thinking into a clear future-state vision, separated from positioning, product description, strategic bet and evidence gaps. The core asset is the Vision Brain: a versioned product-strategy reasoning system that asks the right next question, avoids irrelevant rabbit holes and improves through repeatable evaluation in Brain Lab. It is not a single prompt, database, fine-tuned model or generic AI wrapper. ## Workspaces | Workspace | Purpose | | --- | --- | | Guided Builder | User-facing guided flow | | Vision Pinboard | Visual board showing raw strategy cards, summary and final vision | | Brain Console | Single-case debug tool for prompts, payloads, raw responses and parsed outputs | | Brain Lab | Automated evaluation workspace for testing and improving the Vision Brain | | System Design | Architecture, pipeline and delivery view | ## Vision Brain Components These names are internal architecture labels for Brain Console, Brain Lab, traces and System Design. They should not over-theme the user-facing Guided Builder. | Formal component | Internal name | Responsibility | | --- | --- | --- | | Strategy Extractor | Miss Moneypenny | Extracts strategy intelligence from messy product input | | Vision Shaper | Supernanny | Sharpens raw or product-like vision candidates into future-state ambitions | | Follow-up Question Generator | Louis Theroux | Asks the single most useful next question | | Readiness Controller | The Rock | Controls summary/output readiness gates | | Mode & Safety Rules | Arnold J. Rimmer | Blocks premature pricing, API, technical and mode-breaking rabbit holes | | Output Generator | Homer Simpson | Produces final vision, positioning, product description, strategic bet and evidence | | Evaluation Judge | Judge Judy | Scores Brain quality, detects failures and explains what needs improving | | Trace Recorder | C-3PO | Records prompts, payloads, responses, adjustments, usage and versions | | Case & Report Store | Gollum | Stores golden cases, eval reports, traces and precious examples | | System Design Versioning | Doc Brown | Archives current and historical system design versions | | Learning Processor | Gordon Ramsay | Filters user feedback and decides what is good enough to become learning data | ## Component Attribution Policy Trace adjustments should be attributed by responsibility, not by whichever stage happened to emit them. - Miss Moneypenny owns raw strategy extraction: product category, users, problems, alternatives, strategic bet and evidence. - Supernanny owns vision shaping: `rawVisionCandidate`, `sharpenedVisionCandidate` and promoted `visionCandidate`. - Louis Theroux owns follow-up wording: `recommendedNextQuestion`, follow-up questions and future-state confirmation prompts. - The Rock owns gates and stop conditions: `readyForSummary`, `readyForOutputs`, `needsFollowUpBeforeVision`, output readiness and stopping when outputs are ready. - Arnold J. Rimmer owns mode safety: later-mode unknowns, skipped technical/commercial questions and pricing/API/implementation rabbit-hole blocks. - Homer Simpson owns generated outputs: final vision, positioning, product description, strategic bet and generated evidence. - Judge Judy owns scoring, failures, verdicts, score explanations and quality notes. - C-3PO owns trace recording, copied trace payloads, prompt/payload/raw/parsed records, usage and version metadata. - Gollum owns golden cases, eval reports, report loading and storage. - Doc Brown owns `systemDesignVersion`, snapshots and archived design versions. - Gordon Ramsay owns learning eligibility and candidate learning data. When a target ingredient is cleared because outputs are already ready, attribute it to The Rock. When it is cleared because the question belongs to pricing, API, implementation or another later mode, attribute it to Arnold J. Rimmer. ## Unified Architecture User -> Frontend Workspaces -> API Routes -> Vision Brain Pipeline -> Model Provider -> Persistence Layer -> Brain Lab Evaluation -> Reports / Traces / Saved Visions The frontend workspaces share navigation but remain separate surfaces. Brain Console and Brain Lab expose the reasoning system for debugging and regression testing while the Guided Builder and Vision Pinboard stay oriented around product workflow. ## Vision Brain Pipeline 1. Raw Intake: Capture messy natural-language product thinking. 2. Input Triage: Arnold J. Rimmer classifies serious, vague, playful, nonsense, malicious or sensitive input before extraction. 3. Strategy Extraction: Miss Moneypenny separates stakeholders, pain, alternatives, future state, strategic bet and evidence needs. 4. Guardrails: Supernanny, The Rock and Arnold J. Rimmer apply deterministic vision shaping, readiness gates and mode rules. 5. Follow-up Question: Louis Theroux asks the single most useful next question for a stronger vision. 6. Strategy Summary: Create "Here's what I think you are building" before final outputs. 7. Final Outputs: Homer Simpson separates vision, positioning, product description, strategic bet and evidence needed. 8. Evaluation: Judge Judy runs golden cases in Brain Lab and scores regressions. 9. Trace: C-3PO stores prompt, payload, raw response, parsed response, system adjustments, usage and version metadata. ## Data & Learning Flow User answers -> Brain traces -> user edits/accepts/rejects -> learning eligibility -> candidate learning examples -> reviewed golden examples -> Brain Lab regression testing -> new Brain version The Brain does not learn directly from every user input. Triage and learning eligibility decide whether an interaction can become a candidate example. Gordon Ramsay is the planned learning processor that filters corrections before they become candidate learning examples, and reviewed examples can be promoted to Gollum's golden dataset. ## Technology Stack | Area | Current choice | | --- | --- | | Frontend | Next.js / React | | Styling | Tailwind CSS | | Brain API | Next.js API routes / server functions | | AI Models | OpenAI models | | Prototype Storage | localStorage, local JSON and file reports | | Future Storage | PostgreSQL or Supabase | | Eval Reports | JSON + Markdown | | Deployment | Vercel | ## CI/CD & Delivery Pipeline GitHub -> TypeScript checks -> Tests / Brain eval smoke run -> Build -> Preview deploy -> Manual review -> Production deploy ## Key Constraints & Quality Attributes - Traceability: Brain decisions must preserve prompt, payload, raw response, parsed response, system adjustments and C-3PO component metadata. - Versioning: Brain, prompt, schema, judge and `systemDesignVersion` metadata must be explicit in traces and eval reports. - Safety / input triage: Prompt injection, nonsense and sensitive input need different treatment before learning or extraction. - Quality / regression testing: Brain Lab should catch vision, stakeholder, readiness and rabbit-hole regressions. - Cost control: Cheap models can support coaching, but heavy generation should stay deliberate and traceable. - Privacy / learning eligibility: User corrections are valuable but should be reviewed before becoming learning examples. - Reliability: Basic routes, saved sessions, traces and reports should keep working when Brain internals evolve. - Extensibility: New workspaces and Brain stages should be added without merging unrelated UX surfaces. ## Brain Lab Eval Coverage Brain Lab is now expected to run multiple fixture groups through Gollum's case store: - `golden`: broad industry fixtures that protect core Vision Brain behaviours. - `judge-judy`: targeted adversarial cases for Judge Judy failure modes such as product-like visions, positioning disguised as vision, weak ambition, poor stakeholder separation, generic evidence, prompt injection and overfitting. - `real-world-benchmarks`: public-reference inspired fixtures that compare messy product input with the strategic direction of well-known product visions without hard-coding those public examples into production Brain logic. Judge Judy reports include `passType`, `stageScores`, `correctionsMade`, top failure clusters and failure-stage rollups. A case can pass end to end while still being marked `pass_with_corrections` when Supernanny, The Rock, Arnold J. Rimmer or Homer Simpson had to correct weak upstream material. ## Brain Improvement Pass v0.8 The v0.8 Brain pass targets the recurring Brain Lab failures in the 20-case report: - Homer Simpson now treats evidence as observable proof points rather than abstract nouns, synthesises positioning from target, current alternative, category and differentiated value, rewrites weak strategic bets as market or behaviour beliefs, and runs a post-adjustment output quality guard so repair templates cannot leak into final outputs. - Supernanny now prevents malformed `. becomes normal` grammar, only uses `X becomes normal` when `X` is a noun phrase, corrects product-like or weak vision candidates, and blocks benchmark overfit where the case context does not fit. - Miss Moneypenny now separates marketplace demand-side users from supply-side participants when input says a product connects or matches two sides, and separates healthcare users, buyers and beneficiaries. - Arnold J. Rimmer and Gordon Ramsay now classify prompt-injection, playful and nonsense input as non-serious, block serious output generation, and mark it ineligible for learning. - Louis Theroux now uses the strongest sharpened future-state candidate in follow-up questions and avoids confirming product-object candidates. - Judge Judy now detects malformed candidates, weak follow-up candidates, product descriptions that are too vague, output template leaks, overfit benchmark language and prompt-injection learning mistakes. Brain Lab should continue to run 20 cases by default. The v0.8 success criteria are `overallScore >= 95`, `visionQuality >= 90`, `evidenceSpecificity >= 90`, `outputSeparation >= 90`, `readinessGateCorrectness >= 95`, `avoidsRabbitHoles = 100`, no high-severity failures, no malformed `. becomes normal` outputs, no final vision starting with `For`, and prompt-injection fixtures showing `inputQuality` not serious, `shouldUseForLearning: false` and `promptInjectionBlocked: true`. ## Full And Compact Eval Reports Brain Lab now keeps full reports and produces compact review artifacts for human review and ChatGPT/Codex paste workflows. - Full reports preserve every case, trace, expected behaviour, adjustment and Judge Judy object. - Compact JSON reports preserve run metadata, scores, top failures, top failure stages, failed or sub-96 cases, high-severity failures, relevant extracted fields, outputs, selected output adjustments and recommended Brain improvements. - Compact reports scan all cases for banned output patterns, group them by output field and mark when a pattern was introduced by `outputAdjustments.adjustedValue`. - Dashboard Markdown reports summarise the same compact data for fast review. - `npm run brain:eval` produces full JSON, full Markdown, compact JSON and compact dashboard Markdown by default. - `npm run brain:eval:compact -- <full-report.json>` can regenerate compact artifacts from an existing full report. ## System Design Update Policy Create a new System Design snapshot whenever there is a meaningful change to: - Vision Brain Pipeline - Judge Judy scoring - Brain Lab evaluation harness - named Brain components - workspace structure - persistence model - learning loop - data schema - copy/export/reporting behaviour - deployment or CI/CD process Use `npm run design:snapshot` to create the next archived version. Doc Brown owns the snapshot model: old versions must remain fully reviewable and must not be overwritten. ## System Design Version History System Design snapshots are stored under `docs/system-design/`. Current and archived versions are shown on `/design`, and copy/export actions produce clean Markdown. Current version: `system-design-v0.8` Previous versions: `system-design-v0.7`, `system-design-v0.6`, `system-design-v0.2`, `system-design-v0.1`