FastAPI 105: Caching Strategies & Redis Patterns — Cache-Aside, Invalidation & Thundering Herd
FastAPIProduction

FastAPI 105: Caching Strategies & Redis Patterns — Cache-Aside, Invalidation & Thundering Herd

April 3, 202613 min readPART 05 / 18

In Part 4 we built rate limiting that actually holds across workers — token buckets in Redis, sliding windows, and the atomic pipeline that prevents race conditions. Now we're adding the layer that makes your API fast: caching. Redis between your app and your database, absorbing read load so Postgres doesn't have to. But caching has traps. Stale data served for hours. Memory that grows without bound. And the thundering herd — 500 requests all hitting the DB simultaneously when a single TTL expires. This is Part 5.

Why caching is not optional at scale

A Postgres query scanning an indexed table might take 5–20ms. Under moderate load, that's fine. But at 500 req/s on the same endpoint, you're doing 500 × 20ms = 10 seconds of query work per second, on a database that can realistically handle 300–500 concurrent connections before it starts queueing. You either cache, or you scale horizontally and spend money instead of milliseconds.

Redis reads from RAM. A simple GET takes ~0.1ms — 200× faster than the cheapest indexed query. The economics are obvious. The implementation is where engineers get it wrong.

The three caching patterns

1. Cache-aside (lazy loading)

The app checks the cache. On a miss, it queries the database, writes the result to cache, and returns. The cache never talks to the database directly — your application owns that logic.

Request
  │
  ├─ redis.get(key) → HIT → return cached value
  │
  └─ MISS
       │
       ├─ query Postgres
       ├─ redis.setex(key, TTL, value)
       └─ return value
import json
import redis.asyncio as redis

r = redis.Redis(host="localhost", port=6379, decode_responses=True)

async def get_user_profile(user_id: int) -> dict:
    key = f"user:{user_id}:profile"

    # 1. Check cache
    cached = await r.get(key)
    if cached:
        return json.loads(cached)

    # 2. Cache miss — query DB
    user = await db.fetch_one(
        "SELECT id, name, email, role FROM users WHERE id = $1",
        user_id
    )
    if not user:
        return None

    # 3. Write to cache with TTL (300 seconds = 5 minutes)
    await r.setex(key, 300, json.dumps(dict(user)))
    return dict(user)

When to use it: Read-heavy endpoints where data changes infrequently. User profiles, product details, configuration values. The majority of caching you'll build is this pattern.

Trade-off: There's a window of staleness up to the TTL. A user updates their name; for the next 5 minutes, the cache serves the old one. For most use cases this is acceptable. For anything user-visible and immediately important (like "my own profile just after I updated it"), you need explicit invalidation on write.

2. Write-through

Every write goes to the cache and the database, synchronously. The cache is always warm. There's no stale-read problem — writes keep it current.

async def update_user_profile(user_id: int, data: dict):
    # Write to DB
    await db.execute(
        "UPDATE users SET name = $1, email = $2 WHERE id = $3",
        data["name"], data["email"], user_id
    )

    # Write to cache immediately (keep it warm)
    key = f"user:{user_id}:profile"
    profile = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
    await r.setex(key, 300, json.dumps(dict(profile)))

Cost: Every write now does two hops — DB and cache. Write latency increases. If writes are frequent and reads are sparse, you're paying cache write cost for data that may never be read before it expires. This pattern shines when reads heavily outnumber writes.

3. Write-behind (write-back)

Write to cache. Return success. Flush to the database asynchronously, in batches. The fastest write latency of the three — you're just writing to RAM.

Write → cache ✓ → return OK to client
              ↓
         (async worker)
              ↓
         flush to DB

The risk: If your Redis instance crashes between the write and the flush, that data is gone. Permanently. The client got a 200 OK. The database never saw it. This is not a theoretical risk — Redis is in-memory by default. Unless you've configured AOF persistence with fsync=always, you can lose recent writes on crash.

Write-behind is rare in transactional web APIs. It appears in analytics pipelines and write-heavy batch systems where losing a few events is acceptable. For order creation, user data, financial records — avoid it.

Cache invalidation — the genuinely hard part

Phil Karlton's quote isn't a joke. Invalidation is where stale data, cascading deletes, and subtle race conditions live. You have three tools:

Ttl-only

Set a EXPIRE. Accept that data can be stale for up to that window. Simple. Works for data where eventual consistency is fine: public product catalogs, aggregate counts, leaderboards.

await r.setex(key, 300, value)  # expires in 5 minutes, no matter what

Write-invalidate (explicit delete)

On any mutation, delete the cache key. The next read rebuilds it. Precise — staleness window is near-zero. But you must remember to invalidate everywhere writes happen. Forget one code path and you have a silent staleness bug.

async def update_user(user_id: int, data: dict):
    await db.execute("UPDATE users SET name = $1 WHERE id = $2", data["name"], user_id)
    await r.delete(f"user:{user_id}:profile")  # cache gone, next read rebuilds

Versioned keys

Include a version number in the key. On update, increment the version. Old keys expire naturally via TTL. No explicit delete needed — old keys become unreachable.

# Key: user:42:v3:profile
# After update: user:42:v4:profile
# v3 key orphans and expires on its own TTL

async def get_user_versioned(user_id: int):
    version = await r.get(f"user:{user_id}:version") or "1"
    key = f"user:{user_id}:v{version}:profile"
    return await r.get(key)

async def update_user_versioned(user_id: int, data: dict):
    await db.execute("UPDATE users ...", ...)
    await r.incr(f"user:{user_id}:version")  # bump version, old key becomes stale

The thundering herd problem

A popular cache key expires at 2 AM. You have 12 Uvicorn workers, and 200 requests/second hitting that endpoint. In the first 50ms after expiry: all 200 × 0.05 = 10 requests get a cache MISS simultaneously. All 10 (or 200, in a heavy traffic scenario) hit Postgres. Your database goes from idle to 100% CPU in under a second.

T=0: key expires
T=0.001: 200 simultaneous requests → MISS
T=0.001: 200 simultaneous DB queries fired
T=0.3:   DB CPU: 100%, connections queuing
T=1.0:   DB starts timing out → 500 errors

In-process mutex doesn't work — each worker has its own mutex. 12 workers, 12 simultaneous cache misses, 12 simultaneous DB queries. The mutex only protects within a single process.

The fix is a Redis-level distributed lock:

import asyncio

LOCK_TTL = 5  # seconds

async def get_with_stampede_protection(key: str, rebuild_fn):
    # 1. Try cache first
    value = await r.get(key)
    if value:
        return json.loads(value)

    lock_key = f"lock:{key}"

    # 2. Try to acquire distributed lock (NX = only set if not exists)
    acquired = await r.set(lock_key, "1", nx=True, ex=LOCK_TTL)

    if acquired:
        # We won the lock — rebuild the cache
        try:
            value = await rebuild_fn()
            await r.setex(key, 300, json.dumps(value))
            return value
        finally:
            await r.delete(lock_key)
    else:
        # Someone else is rebuilding — wait briefly and retry
        await asyncio.sleep(0.05)
        value = await r.get(key)
        return json.loads(value) if value else await rebuild_fn()

Only one worker across all processes rebuilds the cache. The rest wait 50ms and read the now-warm key. Postgres sees one query instead of 200.

Handling Redis downtime gracefully

Cache is an optimisation. Your API must work without it. If Redis goes down, you fall through to the database — slower, but correct. The common mistake is letting a Redis exception propagate as a 500 to the client.

async def get_user_safe(user_id: int) -> dict:
    key = f"user:{user_id}:profile"

    try:
        cached = await r.get(key)
        if cached:
            return json.loads(cached)
    except redis.RedisError:
        # Redis is down — fall through to DB
        pass

    # Cache miss or Redis down — query DB directly
    user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)

    try:
        await r.setex(key, 300, json.dumps(dict(user)))
    except redis.RedisError:
        pass  # Can't write to cache — that's fine, just serve from DB

    return dict(user)

Common mistakes engineers make

  • No TTL on cache writes. Keys accumulate forever. Redis hits maxmemory, starts evicting with LRU, and you get random cache misses that look like intermittent bugs. Always use setex, never set without expiry.
  • Caching entire DB rows when only 2 fields are needed. You invalidate on any field change, not just the fields you care about. Cache exactly what the endpoint returns — no more.
  • Caching mutable aggregates without write-invalidate. total_orders = 42 cached for 5 minutes. A new order arrives. For 5 minutes you're serving wrong counts. Either short TTL or explicit invalidation on write.
  • In-process mutex for stampede protection. Works only within one process. 12 Uvicorn workers = 12 simultaneous DB queries anyway. Use Redis-level distributed locks.
  • Not wrapping Redis calls in try/except. Redis is a network call. It can fail. If it does and you don't catch it, your whole endpoint returns 500. Cache failures should be transparent to the user.

Part 5 done. Next up — Part 6: Testing & Reliability. What to actually test, contract tests, chaos patterns, and how to write tests that catch production bugs — not just green CI.

← PREV
FastAPI 104: Rate Limiting & Throttling — Token Buckets, Sliding Windows & Redis
← All FastAPI Posts