Staff Prep 06: Transactions & Concurrency — Isolation, Deadlocks & Race Conditions
ArchitectureStaff

Staff Prep 06: Transactions & Concurrency — Isolation, Deadlocks & Race Conditions

April 4, 202610 min readPART 10 / 19

Back to Part 05: Scaling Writes. Concurrency is where systems fail silently. Two users click "buy" simultaneously. A counter gets decremented twice but should only decrement once. A reservation gets double-booked. These are not edge cases — they are production bugs waiting to happen in any concurrent system. This is how Postgres isolation levels and locking prevent them.

The four isolation levels and what they prevent

SQL defines four isolation levels. Each prevents a different class of concurrency anomaly:

  • Dirty Read: Reading uncommitted data from another transaction. Postgres prevents this even at Read Uncommitted (Postgres has no dirty reads).
  • Non-repeatable Read: Reading the same row twice in a transaction and getting different values because another transaction committed between reads.
  • Phantom Read: Running the same range query twice and getting different rows because another transaction inserted/deleted rows in that range.
  • Write Skew: Two transactions each read overlapping data, make decisions based on it, and write non-overlapping data — resulting in a violated invariant. Not in the SQL standard but Postgres's Serializable level prevents it.
sql
-- Set isolation level for a transaction
BEGIN ISOLATION LEVEL READ COMMITTED;     -- Postgres default
BEGIN ISOLATION LEVEL REPEATABLE READ;    -- Prevents non-repeatable reads and phantoms
BEGIN ISOLATION LEVEL SERIALIZABLE;       -- Prevents write skew; slowest

-- Or at session level
SET default_transaction_isolation = 'repeatable read';

Write skew: the subtle bug

Write skew is the hardest concurrency bug to spot. Classic example: on-call system where at least one doctor must always be on duty.

sql
-- Doctor A and Doctor B are both on call
-- Both want to go off-call simultaneously

-- Transaction A (Doctor A goes off-call):
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM doctors WHERE on_call = true;  -- returns 2
-- Thinks: "2 doctors on call, safe to remove myself"
UPDATE doctors SET on_call = false WHERE id = 1;
COMMIT;

-- Transaction B (Doctor B goes off-call), concurrent with A:
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT count(*) FROM doctors WHERE on_call = true;  -- also returns 2 (snapshot)
-- Thinks: "2 doctors on call, safe to remove myself"
UPDATE doctors SET on_call = false WHERE id = 2;
COMMIT;

-- Result: 0 doctors on call. Both transactions "succeeded". Invariant violated.

-- Fix: use SERIALIZABLE isolation (detects the conflict, one transaction will retry)
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT count(*) FROM doctors WHERE on_call = true;
-- Postgres tracks the read set; if another serializable transaction modifies it,
-- one will get: ERROR: could not serialize access due to read/write dependencies

FOR UPDATE: explicit row locking

When you need to read a row and then update it, the gap between SELECT and UPDATE is a race condition window. Another transaction can modify the row between your read and your write. FOR UPDATE locks the row at read time.

sql
-- Race condition without locking
BEGIN;
SELECT balance FROM accounts WHERE id = 42;  -- reads 100
-- Another transaction runs: UPDATE accounts SET balance = 50 WHERE id = 42
UPDATE accounts SET balance = balance - 30 WHERE id = 42;  -- sets to 70, not 20!
COMMIT;

-- Safe: lock the row at read time
BEGIN;
SELECT balance FROM accounts WHERE id = 42 FOR UPDATE;  -- acquires lock, reads 100
-- Other transaction's UPDATE will BLOCK here until we COMMIT
UPDATE accounts SET balance = balance - 30 WHERE id = 42;  -- correctly sets to 70
COMMIT;

-- FOR UPDATE SKIP LOCKED: skip rows already locked by other transactions
-- Useful for job queues where workers pick up work without blocking each other
SELECT * FROM jobs WHERE status = 'pending' ORDER BY created_at LIMIT 1
FOR UPDATE SKIP LOCKED;
-- Each worker gets a different job row, no lock contention

Deadlocks: how they happen and how to prevent them

A deadlock occurs when two transactions each hold a lock the other needs. Postgres detects deadlocks and terminates one transaction with an error. The terminated transaction must retry.

sql
-- Deadlock scenario
-- Transaction A locks account 1, then tries to lock account 2
-- Transaction B locks account 2, then tries to lock account 1

-- Transaction A:
BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;  -- locks row 1
-- ... some work ...
UPDATE accounts SET balance = balance + 100 WHERE id = 2;  -- waits for B to release lock 2

-- Transaction B (concurrent):
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 2;   -- locks row 2
-- ... some work ...
UPDATE accounts SET balance = balance + 50 WHERE id = 1;   -- waits for A to release lock 1
-- DEADLOCK DETECTED -- Postgres kills one of these transactions

-- Prevention: always acquire locks in a consistent order
-- If all transactions lock accounts in ascending id order, deadlocks are impossible
BEGIN;
-- Always lock lower id first
UPDATE accounts SET balance = balance - 100 WHERE id = LEAST(1, 2);
UPDATE accounts SET balance = balance + 100 WHERE id = GREATEST(1, 2);

Advisory locks: application-level coordination

Advisory locks let your application use Postgres as a distributed mutex without locking actual rows. Useful for: ensuring only one background worker runs a specific job, preventing concurrent report generation, coordinating distributed cron jobs.

sql
-- Session-level advisory lock (held until released or session ends)
SELECT pg_try_advisory_lock(12345);  -- returns true if acquired, false if already held

-- Safe pattern: try to acquire, do work, release
DO $$
BEGIN
    IF pg_try_advisory_lock(42) THEN
        -- Do exclusive work here
        RAISE NOTICE 'Lock acquired, running exclusive job';
        PERFORM pg_advisory_unlock(42);
    ELSE
        RAISE NOTICE 'Another worker has the lock, skipping';
    END IF;
END;
$$;

-- Transaction-level advisory lock (released automatically at COMMIT/ROLLBACK)
SELECT pg_try_advisory_xact_lock(42);
-- Safer: no need to remember to release

Optimistic vs pessimistic concurrency

Pessimistic (FOR UPDATE): Lock the row immediately. Other transactions block. Best for: high contention scenarios where conflicts are likely and the critical section is short.

Optimistic (version check): Read without locking. At write time, check that the row has not changed. If it has, retry. Best for: low-contention scenarios where conflicts are rare and the critical section is long.

sql
-- Optimistic concurrency with a version column
CREATE TABLE documents (
    id          BIGSERIAL PRIMARY KEY,
    content     TEXT,
    version     INT NOT NULL DEFAULT 1,
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

-- Read the document and its version
SELECT id, content, version FROM documents WHERE id = 5;
-- User edits content for 30 seconds...

-- Write: only update if version has not changed
UPDATE documents
SET content = 'new content', version = version + 1
WHERE id = 5 AND version = 1;  -- optimistic check

-- If 0 rows updated: someone else modified it, retry or show conflict to user
GET DIAGNOSTICS rows_updated = ROW_COUNT;
IF rows_updated = 0 THEN
    RAISE EXCEPTION 'Conflict: document was modified by another user';
END IF;

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What is write skew and why does Repeatable Read not prevent it? What isolation level does?
  2. You have a ticket reservation system. Two users try to buy the last ticket simultaneously. Write the safe SQL that prevents double-booking.
  3. Walk through exactly how a deadlock occurs with two transactions each updating two accounts. What is the prevention strategy?
  4. When would you choose optimistic concurrency over FOR UPDATE? When would you choose pessimistic?
  5. What does FOR UPDATE SKIP LOCKED do and why is it useful for job queue implementations?

Next up — Part 07: Zero-Downtime Migrations. The expand-contract pattern, concurrent index builds, and exactly how to add a NOT NULL column to a 500M-row table without locking it.

← PREV
Postgres Internals 104: Indexes
← All Architecture Posts