Blog

Race Conditions in Booking Systems (and How I Fixed Them)

By Ian StrangFebruary 16, 2026

Five players click "Book" at the same moment. There's one spot left. Who gets it?

This isn't a theoretical problem. It happened in production. The match had 17 spots. 16 were filled. Five people saw "1 spot remaining" and clicked simultaneously. Without proper handling, all five would have been charged, all five would have expected to play, and there would be an overbooked match and four angry refund requests.

Building the booking system for Capo forced me to understand race conditions in a way I never had before. The payment system couldn't just work most of the time. It had to work every time, even when multiple users were competing for the same resource.

The Problem

A naive booking flow looks like this:

  1. Check if spots are available
  2. If yes, create a checkout session
  3. User pays
  4. Confirm the booking

The race condition lives between steps 1 and 4. Five users all pass step 1 (spots available). All five create checkout sessions. All five pay. Now you have five confirmed payments for one spot.

This isn't hypothetical. Thursday night football has a regular crowd. When a popular match opens for booking, multiple people respond within seconds. The "last spot" scenario happens regularly.

The Architecture

The solution has three parts: database locks, a single allocator function, and automatic refunds for race condition losers.

Database Locks

When checking availability and confirming a booking, the system uses PostgreSQL's serializable isolation with row-level locks:

await prisma.$transaction(async (tx) => {
  // Lock the match row
  await tx.$executeRaw`
    SELECT * FROM upcoming_matches 
    WHERE id = ${matchId} 
    FOR UPDATE
  `;
  
  // Now safely check and update
  const available = await checkAvailability(tx, matchId);
  if (available > 0) {
    await confirmBooking(tx, matchId, playerId);
  }
}, { isolationLevel: 'Serializable' });

The FOR UPDATE lock ensures only one transaction can modify the match at a time. Other transactions wait. This prevents the race where multiple users see "1 spot available" simultaneously.

Single Allocator

All booking confirmations go through one function: confirmSeat(). This function is the only place in the codebase that can set a player's status to "IN" for a paid match.

Having a single allocator means race condition handling exists in exactly one place. Every booking path — direct booking, waitlist claims, admin additions — routes through this function. Get it right once, and it's right everywhere.

Automatic Refunds

When five players race for the last spot, one wins and four lose. The losers have already paid. They need automatic refunds.

The flow:

  1. All five create Stripe checkout sessions
  2. All five complete payment (Stripe doesn't know about the platform's capacity limits)
  3. Webhooks arrive for all five payments
  4. First webhook to reach confirmSeat() wins
  5. Other four get immediate, automatic refunds

The player sees: "Sorry, the match filled up while you were checking out. Your payment has been refunded."

This happens automatically. No admin intervention. No support tickets. The system handles the race condition gracefully.

The Webhook Challenge

Stripe webhooks are the source of truth for payment completion. But webhooks can arrive out of order, be duplicated, or fail and retry.

Idempotency

Every webhook includes a unique event ID. Processed event IDs are stored in a table with a unique constraint:

CREATE TABLE stripe_webhook_events (
  stripe_event_id TEXT PRIMARY KEY,
  status TEXT NOT NULL,  -- 'received' or 'processed'
  created_at TIMESTAMP
);

If a webhook arrives twice, the second insert fails on the unique constraint. The handler catches this and returns success without reprocessing.

Two-Phase Processing

Webhooks are processed in two phases:

  1. Receive: Insert event with status received. This happens immediately.
  2. Process: Update status to processed after handling. This happens in a transaction with the booking logic.

If processing fails partway through, the event stays in received status. A background job retries unprocessed events.

No Stripe Inside Transactions

A critical architectural rule: Stripe API calls never happen inside database transactions.

Stripe API calls take 100-500ms. Holding a database lock during that time causes contention and potential deadlocks. The refund queue pattern separates the database transaction (fast) from the Stripe call (slow).

The Self-Healing Refund Queue

What if the organizer's Stripe balance is empty when a refund is needed?

This happens. An organizer collects payments, Stripe pays out to their bank account, then a player cancels. The refund fails because there's no money in the Stripe account.

The solution: a self-healing refund queue.

  1. Player cancels booking
  2. Spot is freed immediately (player isn't blocked)
  3. Refund is queued with status pending_funds
  4. New booking arrives, money enters the account
  5. Background job retries pending refunds
  6. Player eventually gets their money

The player's spot is released instantly. They can book another match. The refund happens when funds are available. If too many refunds queue up (3+), the system auto-pauses payments for that tenant until an admin investigates.

The Spec

The payment system spec was 6,700 lines. It included a 29-point invariant checklist:

  • A player with status IN always has a completed payment or valid pass
  • Lock order: config FOR UPDATE, then match FOR SHARE
  • Financial records never cascade-delete
  • Rate limiting: 5 attempts per user per match per 15 minutes
  • Notifications sent AFTER transaction commits

Each invariant was verified before marking a phase complete. The spec wasn't documentation — it was the test plan.

What the AI Got Wrong

The AI initially put Stripe refund calls inside database transactions. This would cause deadlocks under load. The architectural rule ("No Stripe API calls inside transactions") had to be explicit and enforced through code structure — the Stripe gateway file has no Prisma imports.

The AI also defaulted to optimistic concurrency patterns that would cause overselling. Explaining serializable isolation and lock ordering took multiple iterations. The spec eventually included explicit lock ordering rules that the AI could follow.

A forensic audit after implementation found that the waitlist claim endpoint bypassed payment checks entirely. The AI didn't generalize "all booking paths must check payment mode." I created a central bookingDecision service that all paths must use.

The Broader Pattern

I talk more about the overall approach in How I Actually Vibe Code. The booking system illustrates why specs matter for complex features.

The multi-tenancy architecture provided the foundation — tenant isolation was already solved. The worker system handles the background processing for refund queues and webhook retries.

Race conditions can't be solved through iteration. You can't test your way to correctness when the bug only appears under specific timing conditions. The design has to be right. The spec forces that design thinking before code exists.

The Outcome

The booking system has processed thousands of payments. The race condition for the last spot has occurred dozens of times. Every time, one player gets the spot, the others get automatic refunds, and no admin intervention is required.

The worst case cost is predictable: if five players race for one spot, four create checkout sessions that result in refunds. Stripe charges about $0.20 per failed transaction. The platform absorbs $0.80 in fees. Players aren't charged for the platform's race condition handling.

Five players, one spot, zero overbookings, zero support tickets. That's what correct concurrency handling looks like.

Series Navigation