Staff Prep 26: Load Balancing Strategies — L4 vs L7, Health Checks & Consistent Hashing
ArchitectureStaff

Staff Prep 26: Load Balancing Strategies — L4 vs L7, Health Checks & Consistent Hashing

April 4, 20269 min readPART 04 / 06

Back to Part 25: Message Queues. Load balancers distribute traffic across multiple backend instances. But which layer they operate at — TCP (L4) or HTTP (L7) — fundamentally changes what routing decisions are possible. And when backends scale up or down, consistent hashing ensures minimum disruption to in-flight requests and cached state.

L4 vs L7 load balancing

Layer 4 (transport): The balancer sees TCP packets but not their content. It routes based on IP address and port. Fast (no HTTP parsing), but blind to application-level information. Cannot route by URL path, HTTP headers, or request method.

Layer 7 (application): The balancer terminates the HTTP connection and reads the full request. It can route by URL path, headers, cookies, request body, and more. Slower than L4 (HTTP parsing overhead), but enables content-based routing.

text
L4 Load Balancer (AWS NLB):
  - Routes by: IP, TCP port
  - Can do: round-robin, least connections, hash by source IP
  - Cannot do: route by URL path, inspect headers, inject headers
  - Use case: TCP load balancing (databases, game servers, raw TCP services)

L7 Load Balancer (AWS ALB, nginx, HAProxy):
  - Routes by: URL path, headers, cookies, hostname
  - Can do: SSL termination, path-based routing, header injection, health checks
  - Use case: HTTP/HTTPS services, API gateways, microservices routing

Example L7 routing rules (ALB):
/api/v1/orders → order-service target group
/api/v1/users  → user-service target group
/static/*      → S3 bucket (redirect rule)
Host: admin.myapp.com → admin-service target group

Load balancing algorithms

python
import random
import hashlib
from typing import Optional

# Round-robin: simplest, equal distribution
class RoundRobinBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends
        self.index = 0

    def next_backend(self) -> str:
        backend = self.backends[self.index % len(self.backends)]
        self.index += 1
        return backend

# Least connections: routes to backend with fewest active requests
class LeastConnectionsBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends
        self.connections = {b: 0 for b in backends}

    def next_backend(self) -> str:
        return min(self.connections, key=self.connections.get)

# IP hash: same client always routes to same backend (poor man's sticky session)
class IPHashBalancer:
    def __init__(self, backends: list[str]):
        self.backends = backends

    def next_backend(self, client_ip: str) -> str:
        hash_val = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
        return self.backends[hash_val % len(self.backends)]

Consistent hashing: minimal disruption on scaling

Simple modulo hashing (hash(key) % N) breaks when N changes: adding or removing a backend remaps ~N-1/N of all keys to different backends. For caches, this causes a mass cache miss event. Consistent hashing minimises this: when adding a node, only ~1/N of keys are remapped.

python
import bisect
import hashlib

class ConsistentHash:
    def __init__(self, vnodes: int = 100):
        self.vnodes = vnodes
        self.ring = {}     # hash_position -> server_name
        self.sorted_keys = []

    def add_server(self, server: str):
        for i in range(self.vnodes):
            key = f"{server}:{i}"
            hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
            self.ring[hash_pos] = server
            bisect.insort(self.sorted_keys, hash_pos)

    def remove_server(self, server: str):
        for i in range(self.vnodes):
            key = f"{server}:{i}"
            hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
            del self.ring[hash_pos]
            self.sorted_keys.remove(hash_pos)

    def get_server(self, key: str) -> str:
        if not self.ring:
            raise ValueError("No servers in ring")
        hash_pos = int(hashlib.md5(key.encode()).hexdigest(), 16)
        # Find the first server position >= hash_pos (clockwise)
        idx = bisect.bisect(self.sorted_keys, hash_pos)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around
        return self.ring[self.sorted_keys[idx]]

# Usage
ring = ConsistentHash(vnodes=150)  # more vnodes = more even distribution
ring.add_server("cache-1")
ring.add_server("cache-2")
ring.add_server("cache-3")

server = ring.get_server("user:42:profile")  # always maps to same server

# Add a 4th server: only ~25% of keys remap (vs ~75% with modulo hashing)
ring.add_server("cache-4")

Health checks: passive vs active

python
from fastapi import FastAPI, Response

app = FastAPI()

# Health check endpoint (for load balancer probes)
@app.get("/health")
async def health_check(db=Depends(get_db)):
    try:
        await db.execute("SELECT 1")
        db_ok = True
    except Exception:
        db_ok = False

    if not db_ok:
        return Response(
            content='{"status": "degraded", "db": false}',
            status_code=503  # signals LB to route away
        )

    return {"status": "ok", "db": True}

# Deep health check: tests all dependencies
@app.get("/health/deep")
async def deep_health(db=Depends(get_db), redis=Depends(get_redis)):
    checks = {
        "db": await check_db(db),
        "redis": await check_redis(redis),
        "disk": check_disk_space(),
    }
    all_ok = all(checks.values())
    return Response(
        content=json.dumps({"status": "ok" if all_ok else "degraded", "checks": checks}),
        status_code=200 if all_ok else 503
    )

Sticky sessions: when and why to avoid them

Sticky sessions route a user's requests to the same backend based on a cookie. This allows in-memory session state per backend. The problems:

  • Uneven load distribution (popular users all on one backend)
  • When a backend fails, all its "sticky" users lose their session
  • Makes auto-scaling harder (new backends start empty)

Prefer stateless backends with session state in Redis. Then any backend can serve any request.

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What can an L7 load balancer do that an L4 cannot? Give three concrete routing decisions only possible at L7.
  2. You have 4 cache servers. With modulo hashing, one server fails. What fraction of cache keys need to be remapped? With consistent hashing?
  3. Your health check returns 200 OK even when the database is down. What happens? How do you fix the health check?
  4. An L7 load balancer sees a request with Authorization: Bearer EXPIRED_TOKEN. Can it reject this request before it reaches your service? What does it need for that?
  5. Your app uses sticky sessions backed by in-memory session storage. A backend pod is killed during a rolling deployment. What happens to users on that pod? How do you design around this?

Next up — Part 27: CAP Theorem & Distributed Systems. The real version of CAP, eventual consistency, conflict resolution, and CRDTs.

← PREV
Staff Prep 25: Message Queues — Kafka vs SQS vs Redis Streams
← All Architecture Posts