Staff Prep 01: Postgres Schema Design Under Constraints
ArchitectureStaff

Staff Prep 01: Postgres Schema Design Under Constraints

April 4, 20269 min readPART 01 / 19

Day 1 of the 28-day Staff Engineer prep series. We start with Postgres schema design, not because it's glamorous, but because every system design interview, every architectural review, and every production incident I've sat in eventually traces back to the data model. Get the schema wrong and everything built on top of it is fighting uphill forever.

The staff engineer mindset on schema design

Junior engineers ask: "Will this store the data?" Senior engineers ask: "Will this query be fast?" Staff engineers ask: "What will this look like in 18 months when requirements change, and can we migrate it without a maintenance window?" That third question changes everything about how you design.

The core tension in schema design sits between normalisation (avoid duplication, enforce consistency) and denormalisation (accept duplication, gain query speed). Neither extreme is correct. The right schema is the one that makes your actual queries fast while staying maintainable.

Normalisation in practice

Third Normal Form (3NF) is the practical target for transactional systems. Every non-key column depends on the primary key, the whole primary key, and nothing but the primary key. What this means concretely:

sql
-- Bad: city_name depends on zip_code, not on user_id
CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    zip_code    TEXT,
    city_name   TEXT   -- violation: transitive dependency on zip_code
);

-- Good: separate the zip-to-city mapping
CREATE TABLE zip_codes (
    zip_code    TEXT PRIMARY KEY,
    city_name   TEXT NOT NULL,
    state_code  CHAR(2) NOT NULL
);

CREATE TABLE users (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    zip_code    TEXT REFERENCES zip_codes(zip_code)
);

The fixed version means updating a city name requires touching one row in zip_codes, not scanning the entire users table. That is the practical payoff of normalisation.

When to denormalise intentionally

Denormalisation is a deliberate trade-off. You accept write complexity (keeping duplicated data in sync) in exchange for read performance (no joins at query time). The cases where it's justified:

  • Reporting and analytics. A total_orders counter on a user row avoids a COUNT(*) over the orders table on every profile load.
  • Denormalised audit fields. Storing created_by_name alongside created_by_id means the audit log survives user deletion without orphaned lookups.
  • Read-optimised materialised views. Pre-aggregated summaries refreshed asynchronously so OLAP queries don't land on the OLTP tables.
sql
-- Denormalised counter with trigger maintenance
ALTER TABLE users ADD COLUMN order_count INT NOT NULL DEFAULT 0;

CREATE OR REPLACE FUNCTION increment_user_order_count()
RETURNS TRIGGER AS $$
BEGIN
    UPDATE users SET order_count = order_count + 1
    WHERE id = NEW.user_id;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER trg_order_count
AFTER INSERT ON orders
FOR EACH ROW EXECUTE FUNCTION increment_user_order_count();

The trigger keeps order_count consistent automatically. Trade-off: every INSERT into orders now takes a row lock on the matching user row. At high insert rates against a single user, this becomes your bottleneck. I've been burned by this on a SaaS tenant with one power user. Know your write patterns before denormalising counters.

Data types: precision matters

Wrong data types are silent performance killers. The ones I see most often at review time:

sql
-- Wrong types (common mistakes)
id          VARCHAR(36)    -- UUIDs as varchar: slower comparisons, bloated indexes
price       FLOAT          -- floating point for money: rounding errors will hurt you
status      VARCHAR(20)    -- free-text status: no enforcement, inconsistent values
created_at  VARCHAR(30)    -- timestamps as strings: no timezone, no range queries

-- Correct types
id          UUID DEFAULT gen_random_uuid()   -- native UUID type, compact storage
price       NUMERIC(12, 2)                   -- exact decimal arithmetic
status      TEXT CHECK (status IN ('active', 'inactive', 'pending'))  -- or use an ENUM
created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()  -- timezone-aware timestamp

NUMERIC(12, 2) for money is non-negotiable for me. A FLOAT can't exactly represent 0.1 in binary. Multiply that across millions of transactions and you get rounding errors that are nearly impossible to debug in production.

Constraints as documentation and guards

Database constraints are your last line of defence. Application code has bugs. Migrations get run incorrectly. Someone will run direct SQL during an incident. Constraints guarantee invariants hold no matter how the data got in.

sql
CREATE TABLE orders (
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    amount      NUMERIC(12, 2) NOT NULL CHECK (amount > 0),
    status      TEXT NOT NULL DEFAULT 'pending'
                    CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled')),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    shipped_at  TIMESTAMPTZ,

    -- Cross-column constraint: shipped_at only set when status is shipped
    CONSTRAINT chk_shipped_at CHECK (
        (status = 'shipped' AND shipped_at IS NOT NULL) OR
        (status != 'shipped')
    )
);

The cross-column constraint catches the invariant "a shipped order must have a ship date" at the database level. No amount of application-layer validation is as reliable.

Multi-tenancy schema patterns

Multi-tenant SaaS is a common Staff-level design challenge. The three patterns, with trade-offs:

1. Row-level tenancy. Add tenant_id to every table, enforce with Row Level Security (RLS). Simplest operationally: one database, one schema, all tenants co-mingled. Risk: a missing WHERE clause leaks cross-tenant data. Mitigation: RLS policies enforce it at the Postgres level.

sql
-- Row Level Security for multi-tenancy
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON orders
    USING (tenant_id = current_setting('app.tenant_id')::BIGINT);

-- Set at connection/session time in your app:
-- SET app.tenant_id = '42';

2. Schema-per-tenant. Each tenant gets their own Postgres schema (tenant_42.orders). Strong isolation. Migrations must run N times, once per tenant. Works up to ~100 tenants and gets painful at thousands.

3. Database-per-tenant. Maximum isolation, full operational cost. Reserved for enterprise customers with compliance requirements.

Designing for future migrations

The best schema change is the one you do not need. Every column you add costs a migration. Design with flexibility in mind:

  • JSONB for extensible attributes. A metadata JSONB column lets you attach arbitrary key-value data without schema migrations. Index it with GIN when you need to query it.
  • Soft deletes via deleted_at TIMESTAMPTZ instead of hard DELETE. Preserves referential integrity and makes recovery tractable.
  • Avoid NOT NULL on new columns in large tables. Adding a NOT NULL to a large table forces a rewrite unless you provide a DEFAULT. Use nullable first, backfill, then add the constraint.
sql
-- Extensible attributes without schema changes
CREATE TABLE products (
    id          BIGSERIAL PRIMARY KEY,
    name        TEXT NOT NULL,
    price       NUMERIC(12, 2) NOT NULL,
    attributes  JSONB NOT NULL DEFAULT '{}'
);

-- GIN index for JSONB attribute queries
CREATE INDEX idx_products_attributes ON products USING GIN (attributes);

-- Query: find all products that are "featured"
SELECT * FROM products WHERE attributes @> '{"featured": true}'::jsonb;

-- Query: find all products in a specific category
SELECT * FROM products WHERE attributes->>'category' = 'electronics';

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What is a transitive dependency and why does it violate 3NF? Give a concrete example.
  2. You have a comments table with a user_id FK. A user gets deleted. What happens with ON DELETE RESTRICT vs ON DELETE CASCADE vs ON DELETE SET NULL? Which would you choose for comments?
  3. Why should you never use FLOAT for monetary values in a database? What type would you use instead?
  4. You need to add a verified BOOLEAN NOT NULL DEFAULT false column to a 500M-row table in production. Walk through exactly how you would do this without downtime.
  5. Your SaaS app has 2,000 tenants. Compare row-level tenancy with RLS vs schema-per-tenant. What is the operational break-even point?

Next: Part 02: EXPLAIN ANALYZE. We go inside the query planner to understand why your queries are slow.

← All Architecture Posts