Blog

Why I Turned Off PostgreSQL Row-Level Security in Production

By Ian StrangFebruary 16, 2026

The dashboard showed zero players. Not one. I refreshed the page. Still nothing. I checked the database directly — 47 players, all present and correct. I refreshed again. This time, all 47 appeared.

This was August 2025, and I was building Capo, a football team management platform. I had just finished retrofitting multi-tenancy into the codebase, and PostgreSQL's Row-Level Security seemed like the obvious choice for tenant isolation. Every tutorial said so. Every Supabase guide recommended it. RLS was the "right" way to prevent one club from seeing another club's data.

Except it wasn't working. Or rather, it was working — sometimes.

What RLS Promised

Row-Level Security is an elegant concept. You define policies at the database level that automatically filter rows based on the current user's context. Instead of remembering to add WHERE tenant_id = X to every query, the database handles it for you. Forget to filter? No problem — RLS has your back.

For a multi-tenant SaaS, this sounds perfect. Define a policy once, and every query automatically respects tenant boundaries. The security layer lives in the database, not scattered across hundreds of API routes.

I wrote the policies. They looked correct. The AI that helped me build the system confirmed they looked correct. I deployed.

The Symptoms

The bugs were maddening because they were intermittent:

  • Queries returning zero rows despite data existing
  • Tenant switching requiring a hard page refresh
  • The same action working on one attempt and failing on the next

There was no pattern. No reproducible sequence. Sometimes the dashboard loaded instantly with all data. Sometimes it showed "No players found" for a club with 47 active members.

I spent days debugging. The AI suggested checking the RLS policies — they were fine. It suggested caching issues — I disabled all caching. It suggested tenant context problems — the context was being set correctly.

On day four, I asked the AI to generate a test script that logged which database connection handled each query. That's when I found it.

The Root Cause

Modern database deployments use connection pooling. Instead of opening a new database connection for every request (slow and resource-intensive), you maintain a pool of connections that get reused. Supabase uses PgBouncer or Supavisor for this. It's standard practice. It's also incompatible with how I was using RLS.

Here's what was happening:

  1. A request arrives. Middleware sets the RLS context (current tenant) on Connection A.
  2. The actual query executes on Connection B (pulled from the pool).
  3. Connection B has no RLS context set.
  4. RLS policy checks the context, finds nothing, blocks all rows.
  5. Query returns zero results.

The connection that received the context wasn't the connection that ran the query. In a pooled environment, you can't assume session state persists between operations.

This isn't a bug in PostgreSQL or Supabase. It's a fundamental architectural mismatch. RLS assumes session-level context. Connection pooling assumes connections are stateless and interchangeable. These assumptions contradict each other.

The Decision

Once I understood the problem, there were three options:

  1. Disable connection pooling. This would fix RLS but destroy performance and scalability.
  2. Set RLS context on every query. Possible, but error-prone and adds latency.
  3. Disable RLS and enforce tenant isolation in application code.

RLS is the right choice for many systems. In my case, connection pooling made session-based context unreliable, so I chose explicit filtering plus locked-down database roles instead.

I chose option three. It felt counterintuitive — database-level security sounds more robust than application-level filtering. But the analysis changed my mind.

RLS enforcement happens at runtime. If you forget to set the context, queries silently return empty results. Debugging requires understanding session state across pooled connections. The failure mode is invisible data, not visible errors.

Application-level filtering can be enforced at compile time. I created a helper function:

export function withTenantFilter(tenantId: string | null, where?: any) {
  if (!tenantId) throw new Error('Tenant ID required');
  return { tenant_id: tenantId, ...where };
}

Every database query uses this helper. If you forget the tenant ID, TypeScript throws a compile error. Not a runtime error. Not a silent empty result. A red squiggly line in your editor before you even run the code.

The Implementation

I disabled RLS on all operational tables — players, matches, seasons, stats. RLS stayed enabled on three security-critical tables: the auth system, tenant configuration, and admin profiles. These tables are accessed through controlled paths where connection pooling isn't an issue.

Then I updated every API route (over 70 of them) to use the withTenantFilter pattern:

const players = await prisma.players.findMany({
  where: withTenantFilter(tenantId, { is_retired: false })
});

The AI handled the bulk refactoring. I reviewed every change. The pattern was consistent enough that mistakes were obvious.

The Validation

Six months later, a Supabase security audit flagged "RLS disabled" as a potential issue. This prompted a deeper review.

The audit was right to flag it — disabling RLS without compensating controls would be dangerous. But the compensating controls went further than RLS alone. I discovered that while the application code was secure, the database roles used by browser clients (anon and authenticated) still had direct table access. Someone could theoretically bypass the API entirely using the public Supabase key exposed in the browser.

A simple SQL fix revoked these privileges:

REVOKE ALL ON ALL TABLES IN SCHEMA public FROM anon, authenticated;

Now the system has defense in depth: RLS off, application-level filtering enforced by TypeScript, and browser-accessible roles locked out of direct table access. The architecture is more secure than RLS alone would have provided.

What This Means for AI-Assisted Building

I talk more about my overall approach in How I Actually Vibe Code. The RLS incident illustrates a pattern I've seen repeatedly: AI can implement solutions, but it can't always predict how those solutions interact with infrastructure you didn't discuss.

The AI knew how to write RLS policies. It didn't know the deployment used connection pooling in a way that would break those policies. That knowledge gap isn't a failure of AI — it's a reminder that production systems have context that doesn't fit in a prompt.

The debugging process also showed AI's value. Once I asked the right question ("generate a test script that logs which connection handles each query"), the AI produced exactly what I needed. The breakthrough came from combining AI's implementation speed with human intuition about where to look.

The Broader Lesson

Row-Level Security is a good feature. It works well in many contexts. But "best practice" recommendations often assume deployment conditions that don't match your reality.

If you're building a multi-tenant application with connection pooling (which is most modern deployments), test RLS behavior under realistic conditions before committing to it. The elegant solution that works in documentation may fail silently in production.

The multi-tenancy retrofit that preceded this incident took four weeks. The RLS debugging took another week. The application-level filtering solution took two days to implement and has been rock-solid since.

Sometimes the "wrong" approach is the right one. The key is understanding why.

I later applied similar thinking to other parts of the system. The fail-fast tenant architecture I adopted makes bugs visible immediately rather than letting them hide. That philosophy — explicit over implicit, loud failures over silent ones — came directly from the RLS experience.

The intermittent bugs are gone. The dashboard loads every time. And I learned that database-level security isn't always more secure than application-level security — it depends entirely on how your database is deployed.

Series Navigation