Capo

By January 2026, I had built a 125,000-line SaaS platform without writing a single line of code by hand. The AI wrote all of it. But the AI didn't design any of it.

The design lived in specifications — 74 documents totaling about 51,000 lines. These specs weren't documentation written after the fact. They were the blueprints that made AI-assisted development actually work.

I've come to believe that writing specs is the core skill of building with AI. The prompts matter less than you'd think. The specs matter more than anyone talks about.

The Problem With Prompting

Early in the project, I tried the obvious approach: describe what I wanted, let the AI build it.

"Build me a payment system."

The AI would generate something. It would handle the happy path. It would miss edge cases. It would make architectural decisions I hadn't considered. Each iteration revealed new problems. After a dozen rounds of "actually, it also needs to handle X," I'd have a working feature built on a foundation of accumulated patches.

This worked for simple features. It fell apart for complex ones.

The payment system couldn't be built this way. What happens when five players click "Book" for the last spot simultaneously? What if the organizer's Stripe balance is empty when a refund is needed? What if a player cancels after the match is locked but before it starts?

These aren't implementation details. They're design decisions. And they need to be made before the AI writes any code.

The Spec Structure

I developed a structure that works:

## Feature Overview
What this feature does, in plain language.

## Database Schema
Exact tables, columns, types, constraints.

## API Contracts
Endpoints, request formats, response formats.

## State Machines
Valid states and transitions.

## Edge Cases
What happens when things go wrong.

## Invariants
Things that must ALWAYS be true.

The payment system spec was 6,700 lines. That sounds excessive until you realize it covered 12 implementation phases, 26 API endpoints, 10 database tables, and dozens of edge cases. The spec was the product design, the technical architecture, and the test plan combined.

How Specs Change AI Behavior

Without a spec:

"Build me a booking system with payments."

The AI guesses at requirements. It makes reasonable assumptions that may not match your needs. You discover gaps through testing. Each fix potentially introduces new inconsistencies.

With a spec:

"Implement SPEC_Payments.md Section 7: Voluntary Refunds."

The AI has context. It knows the database schema. It knows the state machine. It knows what invariants must be preserved. The implementation matches the design because the design is explicit.

The difference isn't subtle. The payment system, built with a comprehensive spec, had 11 bugs discovered during testing. The genetic algorithm for team balancing, built through iterative prompting, took 15+ iterations before it worked correctly.

The Three-Layer System

Over time, I developed a hierarchy:

Layer 1: Coding Standards (~800 tokens, always loaded)

These are the rules that apply everywhere. Naming conventions. Security patterns. Import structures. The AI sees these on every request.

Layer 2: Feature Specifications (~300-500 tokens each, loaded selectively)

Deep technical specs for specific features. Database schemas, API contracts, state machines. Loaded when working on that feature.

Layer 3: History Document (~2,000 tokens, loaded for architecture decisions)

Why decisions were made. What was tried and abandoned. Institutional memory that prevents revisiting solved problems.

The payoff compounds. By Phase 23 of development, the AI had so much context that new features shipped three times faster than early phases. The specs weren't overhead — they were accelerant.

Specs as Living Documents

Specs evolve through three phases:

Planning: The spec describes what will be built. It's a blueprint.

Implementation: The AI references the spec while building. Deviations are noted.

As-Built: The spec is updated to reflect what was actually built. It becomes documentation.

This lifecycle matters. A spec that's never updated becomes misleading. A spec that's updated after every implementation becomes the authoritative source of truth.

When a new AI session starts, it can read the spec and understand the system. It doesn't need to reverse-engineer intent from code. The intent is documented.

What Goes in a Spec

The payment system spec included:

Race condition handling: "5 players click Book for last spot. First webhook wins. Others get automatic refunds."
Financial integrity rules: "No Stripe API calls inside database transactions."
State transitions: "Payment status can move from pending to completed or refunded, never backwards."
Invariants: "A player with status IN always has either a completed payment or a valid pass."

These aren't implementation details. They're design decisions that the AI needs to respect. Without them in the spec, the AI might make different (and wrong) choices.

The spec also included a 29-point checklist verified before each phase was marked complete. The AI could generate test scenarios by reading the spec.

The Methodology

I talk more about the overall approach in How I Actually Vibe Code. The spec-driven workflow looks like this:

Identify the problem in plain language
Write the spec (most time spent here)
Prompt the AI with spec context
Review generated code
Test edge cases
Update spec to reflect reality

Step 2 takes the most time. It's also where the real engineering happens. The AI can implement a well-specified system reliably. It cannot design a system from vague requirements.

What AI Does Well (and Poorly)

AI excels at:

Boilerplate: CRUD operations, API routes, React components
Pattern application: "Make this like that other file"
Refactoring: "Update all 70 API routes to use new pattern"
Generating code from specs

AI struggles with:

Architecture decisions: It will build whatever you ask, even if it's wrong
Security: It defaults to "make it work" not "make it secure"
Financial logic: Money requires explicit, audited correctness
Judgment calls: "Should we build this feature?" — AI can't answer

The spec is where humans add value. The AI handles translation to code.

The ROI

The multi-tenancy retrofit touched 33 tables, 70+ API routes, and 13 SQL functions. The spec for that work was 2,508 lines. Without it, the AI would have made inconsistent decisions across hundreds of files.

The RLS architecture decision is documented in the multi-tenancy spec. When a security audit questioned why RLS was disabled, the spec explained the reasoning. Future developers (or AI instances) can understand why without re-discovering the connection pooling issues.

Phase 1 of the project (basic stats tracking) took two weeks. Phase 23 (complete payment system with race conditions, refund queues, dispute tracking) also took two weeks. Same time, ten times the complexity. That's the compound interest of good specs.

The Insight

You're not a coder when you build with AI. You're an architect and product manager. Your job is deciding what to build and why. The AI's job is figuring out how and writing the implementation.

The specs are the artifact of that thinking. They're not documentation — they're the product. The code is just the spec translated into something a computer can execute.

45,000 lines of specifications for 125,000 lines of code. That ratio isn't overhead. It's the reason the code works.

Series Navigation

View all 9 articles in this series Start here: How I Actually Vibe Code

Part 1 of 9Next: Why I Turned Off PostgreSQL RLS