FastAPIProduction

FastAPI 103: API Design, Pagination, Filtering & Error Handling at Scale

March 31, 202611 min readPART 03 / 18

Part 2 was about what runs on the event loop and what gets offloaded to a thread pool. Part 3 is one layer higher: how do you design the API surface itself so it holds up at scale? Pagination and error handling in particular are places where teams make decisions in week one that they're still paying for in year three.

The pagination trap: why offset breaks

Offset-based pagination is the first thing everyone reaches for. It maps cleanly to UI concepts ("page 1", "page 2") and is dead simple to implement.

@app.get("/orders")
async def list_orders_bad(page: int = 1, page_size: int = 50, db: AsyncSession = Depends(get_db)):
    offset = (page - 1) * page_size
    result = await db.execute(
        select(Order).order_by(Order.created_at.desc()).offset(offset).limit(page_size)
    )
    return result.scalars().all()

This works fine for the first few months. Then two problems emerge simultaneously:

Problem 1, performance. OFFSET 1000 does not skip 1000 rows. It reads and discards them. At OFFSET 50000, Postgres scans 50,000 rows just to throw them away. On a 1M row table with real traffic, p99 latency for late pages creeps into seconds.

EXPLAIN ANALYZE
SELECT * FROM orders ORDER BY created_at DESC OFFSET 50000 LIMIT 50;

Limit  (cost=4821.00..4825.83 rows=50 width=128)
       (actual time=1243.821..1244.109 rows=50)
  ->  Sort  (cost=4321.00..4571.00 rows=100000 width=128)
            (actual time=981.423..1191.734 rows=50050)  # scanned 50,050 rows
        Sort Key: created_at DESC
        Sort Method: external merge  Disk: 12384kB
Planning Time: 0.4 ms
Execution Time: 1244.3 ms  # 1.2 seconds for page 1001

Problem 2, consistency. If a new order is inserted while a user is paginating, every subsequent row shifts. Page 2 now returns a row the user already saw on page 1. Page 3 silently drops a row entirely. No error, no warning, just wrong data.

Initial state:          After new insert:
Page 1: rows 1-50       Page 1: rows 1-50 (new row 0 pushed in)
Page 2: rows 51-100     Page 2: rows 50-99  # row 50 repeated!
                                              row 100 dropped

Cursor pagination: how to do it right

Cursor pagination replaces "skip N rows" with "start after this specific row." The cursor is an opaque token the client passes back, typically a base64-encoded value of the last row's sort key. The docs will tell you this is harder to implement; in practice it's maybe ten extra lines.

import base64, json
from typing import Optional, Any
from fastapi import Query
from sqlalchemy import select

def encode_cursor(id: int) -> str:
    return base64.urlsafe_b64encode(json.dumps({"id": id}).encode()).decode()

def decode_cursor(cursor: str) -> int:
    return json.loads(base64.urlsafe_b64decode(cursor.encode()))["id"]

@app.get("/orders")
async def list_orders(
    after: Optional[str] = None,           # opaque cursor
    limit: int = Query(default=50, le=500), # always bound limit
    db: AsyncSession = Depends(get_db),
):
    query = select(Order).order_by(Order.id.asc())

    if after:
        last_id = decode_cursor(after)
        query = query.where(Order.id > last_id)

    query = query.limit(limit + 1)  # fetch one extra to check if next page exists

    rows = (await db.execute(query)).scalars().all()
    has_more = len(rows) > limit
    items = rows[:limit]

    return {
        "items": items,
        "next_cursor": encode_cursor(items[-1].id) if has_more else None,
        "has_more": has_more,
    }

The + 1 trick is important: instead of running a COUNT query to check if there are more pages, just fetch one extra row. If you get limit + 1 rows back, there's a next page. Slice to limit before returning.

The non-unique Cursor problem

Using created_at as a cursor breaks the moment two rows share a timestamp. The query WHERE created_at > '2024-01-15 12:00:00' will skip every row with that exact timestamp, not only the one you already returned.

Fix: a composite cursor with sort key plus a unique tiebreaker.

def encode_cursor(created_at: datetime, id: int) -> str:
    return base64.urlsafe_b64encode(
        json.dumps({"created_at": created_at.isoformat(), "id": id}).encode()
    ).decode()

# Query uses (created_at, id) tuple comparison:
query = query.where(
    or_(
        Order.created_at > cursor_created_at,
        and_(Order.created_at == cursor_created_at, Order.id > cursor_id)
    )
).order_by(Order.created_at.asc(), Order.id.asc())

Now rows with identical timestamps are ordered by ID, and the cursor is always unambiguous.

Filtering: pydantic query params done right

Don't parse query parameters by hand. Declare a Pydantic model and inject it as a dependency. FastAPI will parse, validate, and coerce types before your handler runs, and you get a free OpenAPI schema out of it.

from pydantic import BaseModel, Field
from typing import Optional
from datetime import datetime
from enum import Enum

class OrderStatus(str, Enum):
    pending = "pending"
    processing = "processing"
    completed = "completed"
    cancelled = "cancelled"

class OrderFilters(BaseModel):
    status: Optional[OrderStatus] = None     # validated enum, rejects "PENDING"
    user_id: Optional[int] = None
    created_after: Optional[datetime] = None
    created_before: Optional[datetime] = None
    min_amount: Optional[float] = Field(None, ge=0)
    after: Optional[str] = None              # cursor
    limit: int = Field(default=50, ge=1, le=500)

@app.get("/orders")
async def list_orders(
    filters: OrderFilters = Depends(),
    db: AsyncSession = Depends(get_db),
):
    query = select(Order).order_by(Order.id.asc())

    if filters.status:
        query = query.where(Order.status == filters.status)
    if filters.user_id:
        query = query.where(Order.user_id == filters.user_id)
    if filters.created_after:
        query = query.where(Order.created_at >= filters.created_after)
    if filters.created_before:
        query = query.where(Order.created_at <= filters.created_before)
    if filters.min_amount is not None:
        query = query.where(Order.amount >= filters.min_amount)
    if filters.after:
        query = query.where(Order.id > decode_cursor(filters.after))

    query = query.limit(filters.limit + 1)
    # ... rest of pagination logic

Adding a new filter like max_amount next month is one line in OrderFilters and two lines in the handler. Existing callers who don't send it get None, and the filter simply doesn't apply. No breaking changes.

Error handling: one shape everywhere

FastAPI's default error format for validation errors looks like this:

{
  "detail": [
    {
      "loc": ["query", "limit"],
      "msg": "value is not a valid integer",
      "type": "type_error.integer"
    }
  ]
}

Your HTTP 404s probably look like {"detail": "Not found"}. Your 500s might return a raw Python exception string. Your custom business logic errors return something else entirely. Every client has to handle all of these differently.

The fix: override the exception handlers once at app startup to enforce a single response shape.

from fastapi import FastAPI, Request
from fastapi.exceptions import RequestValidationError
from fastapi.responses import JSONResponse
import logging

logger = logging.getLogger(__name__)

app = FastAPI()

# ── Custom application error ────────────────────────────────────────────────
class AppError(Exception):
    def __init__(self, code: str, message: str, status: int = 400):
        self.code = code
        self.message = message
        self.status = status

# ── Handlers ────────────────────────────────────────────────────────────────
@app.exception_handler(RequestValidationError)
async def validation_handler(request: Request, exc: RequestValidationError):
    return JSONResponse(
        status_code=422,
        content={
            "error": "validation_error",
            "message": "Request validation failed",
            "details": exc.errors(),
        },
    )

@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
    return JSONResponse(
        status_code=exc.status,
        content={"error": exc.code, "message": exc.message},
    )

@app.exception_handler(Exception)
async def unhandled_handler(request: Request, exc: Exception):
    logger.exception("Unhandled exception on %s %s", request.method, request.url)
    return JSONResponse(
        status_code=500,
        content={
            "error": "internal_error",
            "message": "Something went wrong",
            # Never: "message": str(exc), leaks stack traces and internals
        },
    )

Now every error in the system returns the same shape, whether it came from a validation failure or an unexpected database exception. The frontend writes one error handler, not five.

Raising AppError from anywhere

@app.get("/orders/{order_id}")
async def get_order(order_id: int, db: AsyncSession = Depends(get_db),
                    user: User = Depends(get_current_user)):
    order = await db.get(Order, order_id)
    if not order:
        raise AppError("order_not_found", f"No order with ID {order_id}", status=404)
    if order.user_id != user.id:
        raise AppError("forbidden", "You don't have access to this order", status=403)
    return order

Clean, readable handlers. No try/except scattered everywhere. One place to log, monitor, and adjust error responses across the entire API.

Putting it together: a complete endpoint

from fastapi import FastAPI, Depends, Query
from sqlalchemy.ext.asyncio import AsyncSession
from pydantic import BaseModel, Field
from typing import Optional, Any
import base64, json

app = FastAPI()

class OrderFilters(BaseModel):
    status: Optional[str] = None
    user_id: Optional[int] = None
    limit: int = Field(default=50, ge=1, le=500)
    after: Optional[str] = None

class CursorPage(BaseModel):
    items: list[Any]
    next_cursor: Optional[str]
    has_more: bool
    count: int  # count of items in this page (not the total, which requires a COUNT query)

@app.get("/orders", response_model=CursorPage)
async def list_orders(
    filters: OrderFilters = Depends(),
    db: AsyncSession = Depends(get_db),
    user: User = Depends(require_auth),
):
    query = select(Order).where(Order.tenant_id == user.tenant_id)  # always scope by tenant

    if filters.status:
        query = query.where(Order.status == filters.status)
    if filters.user_id:
        query = query.where(Order.user_id == filters.user_id)
    if filters.after:
        query = query.where(Order.id > decode_cursor(filters.after))

    query = query.order_by(Order.id.asc()).limit(filters.limit + 1)

    rows = (await db.execute(query)).scalars().all()
    has_more = len(rows) > filters.limit
    items = rows[:filters.limit]

    return CursorPage(
        items=items,
        next_cursor=encode_cursor(items[-1].id) if has_more else None,
        has_more=has_more,
        count=len(items),
    )

Common mistakes summary

Offset pagination at scale. You'll discover the problem around 1M rows, and by then you're looking at a migration plus client updates to get off it.
No upper bound on limit. ?limit=100000 is a valid URL. Without le=500 in your Pydantic model, one bad actor (or one well-intentioned bug) takes down your DB.
Returning str(exc) to clients. That string leaks file paths, SQL queries, and internal service names. OWASP files this under "Security Misconfiguration" and they're right to.
Different error shapes per route. Every new shape is a new branch in every frontend client's error handling code.
Cursor on a non-unique field. Two rows with identical created_at and you'll silently skip results. Always use a unique tiebreaker.
No has_more in the response. Clients end up making an extra request just to discover they've reached the end.

Part 3 done. Next: Part 4: rate limiting and throttling. Token bucket vs sliding window, per-user vs global limits, and where to actually enforce them (gateway, middleware, or application code).

← PREV

FastAPI 102: Async vs Sync, What Actually Happens