The bug report was alarming: "All match reports are gone."
Not some match reports. All of them. Every tenant's match reports had vanished. The aggregated tables that powered the match report feature were empty.
It was January 2026, and Capo had grown into a multi-tenant SaaS serving multiple football clubs. A stats update job had run. And it had wiped every club's data.
The Bug
The offending code was a single SQL statement:
DELETE FROM aggregated_match_report WHERE TRUE;This was supposed to clear the cache before rebuilding it. The problem: it cleared the cache for every tenant, not just the one being processed.
The fix was obvious:
DELETE FROM aggregated_match_report WHERE tenant_id = target_tenant_id;But finding this bug revealed a pattern. Similar issues had accumulated throughout the codebase:
- Helper functions that queried the wrong tenant's data
- Configuration lookups that returned the original tenant's values
- Team names hardcoded to "Orange" and "Green" instead of reading from tenant config
- Year dropdowns hardcoded to 2011-2025, useless for new clubs
Each bug had the same root cause: code that worked fine for a single tenant but failed silently in a multi-tenant environment.
The Pattern
The multi-tenant architecture was designed correctly. Every table had a tenant_id column. Every API route used the withTenantContext wrapper. The retrofit had been thorough.
But implementation had drifted. SQL functions accumulated over months without tenant parameters. Schema defaults favored the original tenant. Frontend components assumed data always exists.
A function like get_config_value('heavy_win_threshold') would return the configuration value. But whose configuration? Without a tenant parameter, it defaulted to the original tenant's config. Every club got Berko TNF's settings.
The bugs were invisible because they produced plausible results. The wrong data looked like correct data. Only when a new tenant noticed their team names were wrong, or their year dropdown showed irrelevant years, did the problems surface.
The Solution: Fail Fast
I adopted a principle: functions that need tenant context must crash if they don't receive it.
CREATE OR REPLACE FUNCTION get_config_value(
key TEXT,
default_value TEXT,
target_tenant_id UUID -- Now required
) RETURNS TEXT AS $$
BEGIN
IF target_tenant_id IS NULL THEN
RAISE EXCEPTION 'tenant_id is required for get_config_value';
END IF;
-- ... rest of function
END;
$$;The old version had a default tenant ID. Callers could omit it and get "reasonable" results. The new version has no default. Omit the tenant ID and the function throws an exception.
This breaks callers that forgot the parameter. That's the point. A broken caller is a bug that can be found and fixed. A caller silently using the wrong tenant's data is a bug that might never surface.
The Philosophy
Silent defaults are dangerous in multi-tenant systems. They hide bugs behind plausible-looking data.
Consider the alternatives:
Silent default: Function returns data from tenant A when called without context. The caller doesn't know anything is wrong. Users see incorrect data. The bug might persist for months.
Loud failure: Function crashes when called without context. The error is immediate and obvious. The developer fixes the caller. The bug is eliminated before it affects users.
Loud failures are better than silent wrong answers.
This applies beyond SQL functions. The TypeScript helper for tenant filtering follows the same principle:
export function withTenantFilter(tenantId: string | null, where?: any) {
if (!tenantId) throw new Error('Tenant ID required');
return { tenant_id: tenantId, ...where };
}Forget the tenant ID? Runtime error. In practice, TypeScript catches most of these at compile time, but the runtime check is a safety net.
The Audit
Finding all the problematic code required systematic searching. I grepped for every call to helper functions, checking whether tenant context was passed. The AI helped identify patterns, but human review was necessary to understand whether each call site was correct.
The audit found seven critical bugs:
DELETE WHERE TRUEwiping all tenants' match reports- Date helpers querying wrong tenant's seasons
- Config values returning original tenant's settings
- Team names hardcoded instead of tenant-configured
- Year dropdown hardcoded to irrelevant range
- Fallback titles showing misleading date ranges
- Hall of Fame queries crossing tenant boundaries
Each fix followed the same pattern: add explicit tenant parameter, remove default, let failures surface remaining callers.
The Broader Lesson
I talk more about the overall build process in How I Actually Vibe Code. The fail-fast pattern emerged from painful experience.
The RLS Wars taught us that implicit security (database-level RLS) can fail silently. Explicit security (application-level filtering) makes failures visible.
The multi-tenancy retrofit established the patterns. But patterns drift over time. New code gets added. Shortcuts get taken. Without enforcement, tenant isolation erodes.
Fail-fast is enforcement. It makes the correct path the only path that works.
Schema Defaults Are Dangerous
One subtle issue: Prisma schema defaults.
model matches {
team_a_name String @default("Orange")
team_b_name String @default("Green")
}This means every new match, for every tenant, gets "Orange" and "Green" as team names unless explicitly set otherwise. For the original tenant, that's correct. For new tenants who configured different team names, it's wrong.
The fix: remove schema defaults for tenant-specific values. Force explicit setting at creation time. If the code forgets to set team names, the insert fails rather than silently using wrong defaults.
The Outcome
After the audit and fixes:
- All SQL functions require explicit tenant context
- Schema defaults removed for tenant-specific values
- Test tenant stats populate correctly
- Team names pulled from tenant config at match creation
- Year dropdowns dynamically populated from tenant's seasons
The DELETE WHERE TRUE bug could never happen again. The function now requires a tenant ID. Omitting it crashes the job. The crash is logged. The bug is found. The data is safe.
For AI-Assisted Building
AI is good at applying patterns consistently. It's less good at noticing when patterns have drifted. The audit that found these bugs required understanding intent, not just syntax.
The AI could update all callers once I identified the pattern. But identifying which functions needed tenant parameters, and which callers were passing incorrect context, required human judgment.
The fail-fast principle helps here too. When the AI generates new code, it must satisfy the explicit requirements. A function that crashes without tenant context forces the AI to provide that context. Silent defaults let incorrect code slip through.
Explicit is better than implicit. Loud is better than silent. Crashes are better than wrong data.
The match reports are back. Every tenant sees their own data. And the architecture now enforces what the design always intended.