Five players click "Book" at the same moment. There's one spot left. Who gets it?
This isn't a theoretical problem. It happened in production. The match had 17 spots. 16 were filled. Five people saw "1 spot remaining" and clicked simultaneously. Without proper handling, all five would have been charged, all five would have expected to play, and there would be an overbooked match and four angry refund requests.
Building the booking system for Capo forced me to understand race conditions in a way I never had before. The payment system couldn't just work most of the time. It had to work every time, even when multiple users were competing for the same resource.
The Problem
A naive booking flow looks like this:
- Check if spots are available
- If yes, create a checkout session
- User pays
- Confirm the booking
The race condition lives between steps 1 and 4. Five users all pass step 1 (spots available). All five create checkout sessions. All five pay. Now you have five confirmed payments for one spot.
This isn't hypothetical. Thursday night football has a regular crowd. When a popular match opens for booking, multiple people respond within seconds. The "last spot" scenario happens regularly.
The Architecture
The solution has three parts: database locks, a single allocator function, and automatic refunds for race condition losers.
Database Locks
When checking availability and confirming a booking, the system uses PostgreSQL's serializable isolation with row-level locks:
await prisma.$transaction(async (tx) => {
// Lock the match row
await tx.$executeRaw`
SELECT * FROM upcoming_matches
WHERE id = ${matchId}
FOR UPDATE
`;
// Now safely check and update
const available = await checkAvailability(tx, matchId);
if (available > 0) {
await confirmBooking(tx, matchId, playerId);
}
}, { isolationLevel: 'Serializable' });The FOR UPDATE lock ensures only one transaction can modify the match at a time. Other transactions wait. This prevents the race where multiple users see "1 spot available" simultaneously.
Single Allocator
All booking confirmations go through one function: confirmSeat(). This function is the only place in the codebase that can set a player's status to "IN" for a paid match.
Having a single allocator means race condition handling exists in exactly one place. Every booking path — direct booking, waitlist claims, admin additions — routes through this function. Get it right once, and it's right everywhere.
Automatic Refunds
When five players race for the last spot, one wins and four lose. The losers have already paid. They need automatic refunds.
The flow:
- All five create Stripe checkout sessions
- All five complete payment (Stripe doesn't know about the platform's capacity limits)
- Webhooks arrive for all five payments
- First webhook to reach
confirmSeat()wins - Other four get immediate, automatic refunds
The player sees: "Sorry, the match filled up while you were checking out. Your payment has been refunded."
This happens automatically. No admin intervention. No support tickets. The system handles the race condition gracefully.
The Webhook Challenge
Stripe webhooks are the source of truth for payment completion. But webhooks can arrive out of order, be duplicated, or fail and retry.
Idempotency
Every webhook includes a unique event ID. Processed event IDs are stored in a table with a unique constraint:
CREATE TABLE stripe_webhook_events (
stripe_event_id TEXT PRIMARY KEY,
status TEXT NOT NULL, -- 'received' or 'processed'
created_at TIMESTAMP
);If a webhook arrives twice, the second insert fails on the unique constraint. The handler catches this and returns success without reprocessing.
Two-Phase Processing
Webhooks are processed in two phases:
- Receive: Insert event with status
received. This happens immediately. - Process: Update status to
processedafter handling. This happens in a transaction with the booking logic.
If processing fails partway through, the event stays in received status. A background job retries unprocessed events.
No Stripe Inside Transactions
A critical architectural rule: Stripe API calls never happen inside database transactions.
Stripe API calls take 100-500ms. Holding a database lock during that time causes contention and potential deadlocks. The refund queue pattern separates the database transaction (fast) from the Stripe call (slow).
The Self-Healing Refund Queue
What if the organizer's Stripe balance is empty when a refund is needed?
This happens. An organizer collects payments, Stripe pays out to their bank account, then a player cancels. The refund fails because there's no money in the Stripe account.
The solution: a self-healing refund queue.
- Player cancels booking
- Spot is freed immediately (player isn't blocked)
- Refund is queued with status
pending_funds - New booking arrives, money enters the account
- Background job retries pending refunds
- Player eventually gets their money
The player's spot is released instantly. They can book another match. The refund happens when funds are available. If too many refunds queue up (3+), the system auto-pauses payments for that tenant until an admin investigates.
The Spec
The payment system spec was 6,700 lines. It included a 29-point invariant checklist:
- A player with status IN always has a completed payment or valid pass
- Lock order: config FOR UPDATE, then match FOR SHARE
- Financial records never cascade-delete
- Rate limiting: 5 attempts per user per match per 15 minutes
- Notifications sent AFTER transaction commits
Each invariant was verified before marking a phase complete. The spec wasn't documentation — it was the test plan.
What the AI Got Wrong
The AI initially put Stripe refund calls inside database transactions. This would cause deadlocks under load. The architectural rule ("No Stripe API calls inside transactions") had to be explicit and enforced through code structure — the Stripe gateway file has no Prisma imports.
The AI also defaulted to optimistic concurrency patterns that would cause overselling. Explaining serializable isolation and lock ordering took multiple iterations. The spec eventually included explicit lock ordering rules that the AI could follow.
A forensic audit after implementation found that the waitlist claim endpoint bypassed payment checks entirely. The AI didn't generalize "all booking paths must check payment mode." I created a central bookingDecision service that all paths must use.
The Broader Pattern
I talk more about the overall approach in How I Actually Vibe Code. The booking system illustrates why specs matter for complex features.
The multi-tenancy architecture provided the foundation — tenant isolation was already solved. The worker system handles the background processing for refund queues and webhook retries.
Race conditions can't be solved through iteration. You can't test your way to correctness when the bug only appears under specific timing conditions. The design has to be right. The spec forces that design thinking before code exists.
The Outcome
The booking system has processed thousands of payments. The race condition for the last spot has occurred dozens of times. Every time, one player gets the spot, the others get automatic refunds, and no admin intervention is required.
The worst case cost is predictable: if five players race for one spot, four create checkout sessions that result in refunds. Stripe charges about $0.20 per failed transaction. The platform absorbs $0.80 in fees. Players aren't charged for the platform's race condition handling.
Five players, one spot, zero overbookings, zero support tickets. That's what correct concurrency handling looks like.