Staff Prep 16: asyncio Deep Dive — Event Loop, Tasks & Common Pitfalls
ArchitectureStaff

Staff Prep 16: asyncio Deep Dive — Event Loop, Tasks & Common Pitfalls

April 4, 20269 min readPART 14 / 18

Back to Part 15: Python GIL. asyncio looks simple until you hit its edges: tasks that silently swallow exceptions, gather that cancels everything when one fails, locks that cause deadlocks, and event loops that block without warning. This is the deep dive that closes the gap between "I use async" and "I understand async."

Coroutines vs tasks vs futures

These three are related but distinct. Understanding the difference is fundamental.

python
import asyncio

# Coroutine: a function defined with async def
# It does NOT run when called — it returns a coroutine object
async def my_coro() -> int:
    await asyncio.sleep(1)
    return 42

coro = my_coro()  # returns  — nothing runs yet
result = await coro  # NOW it runs

# Task: a coroutine scheduled to run on the event loop
# It starts running immediately in the background
task = asyncio.create_task(my_coro())  # starts running now

# You can await the task later to get its result
result = await task  # waits for it to complete

# Future: a low-level placeholder for a result
# Used by the event loop internals; rarely created directly
future = asyncio.Future()
future.set_result(42)  # set from another coroutine
result = await future  # gets 42

# Task IS-A Future — you can await a task just like a future

The event loop: how it works

python
import asyncio

# The event loop is a single-threaded scheduler
# It maintains a queue of ready-to-run callbacks and coroutines
# When a coroutine hits `await`, it registers a callback and returns control

async def task_a():
    print("A: start")
    await asyncio.sleep(1)  # suspends, registers timer callback
    print("A: after sleep")  # runs 1 second later

async def task_b():
    print("B: start")
    await asyncio.sleep(0.5)  # suspends, registers timer callback
    print("B: after sleep")   # runs 0.5 seconds later

async def main():
    # create_task schedules both to run concurrently
    ta = asyncio.create_task(task_a())
    tb = asyncio.create_task(task_b())
    await asyncio.gather(ta, tb)

asyncio.run(main())
# Output:
# A: start
# B: start
# B: after sleep  (at 0.5s)
# A: after sleep  (at 1.0s)
# Total time: ~1 second, not 1.5

Task exception handling: the silent failure

python
import asyncio

async def failing_task():
    await asyncio.sleep(0.1)
    raise ValueError("Something went wrong")

# WRONG: task exception silently discarded if task is never awaited
async def bad_pattern():
    task = asyncio.create_task(failing_task())
    # task is created and running
    # We do other work...
    await asyncio.sleep(0.5)
    # task has failed, but nobody checked
    # Python will print a warning: "Task exception was never retrieved"

# CORRECT: always await tasks or add exception handlers
async def good_pattern():
    task = asyncio.create_task(failing_task())
    try:
        await task
    except ValueError as e:
        print(f"Task failed: {e}")

# Alternative: add a done callback
async def with_callback():
    def handle_exception(task: asyncio.Task):
        if task.exception():
            print(f"Background task failed: {task.exception()}")

    task = asyncio.create_task(failing_task())
    task.add_done_callback(handle_exception)

Asyncio.gather: all-or-nothing by default

python
import asyncio

async def fetch_a():
    await asyncio.sleep(0.5)
    return "data A"

async def fetch_b():
    await asyncio.sleep(0.2)
    raise ConnectionError("Service B is down")

# Default: first exception cancels remaining tasks and re-raises
async def gather_default():
    try:
        results = await asyncio.gather(fetch_a(), fetch_b())
    except ConnectionError as e:
        print(f"Failed: {e}")  # fetch_a was cancelled

# return_exceptions=True: all tasks run, exceptions are returned as values
async def gather_resilient():
    results = await asyncio.gather(
        fetch_a(),
        fetch_b(),
        return_exceptions=True,
    )
    for r in results:
        if isinstance(r, Exception):
            print(f"Error: {r}")
        else:
            print(f"Success: {r}")
    # Both tasks complete. Results: ["data A", ConnectionError("Service B is down")]

Asyncio.lock: preventing race conditions

python
import asyncio

# Without lock: race condition on shared state
counter = 0

async def increment_bad():
    global counter
    value = counter          # read
    await asyncio.sleep(0)   # yield — another coroutine can run here
    counter = value + 1      # write — may overwrite another coroutine's write

# With lock: safe increment
lock = asyncio.Lock()

async def increment_safe():
    global counter
    async with lock:
        value = counter
        await asyncio.sleep(0)  # even with yields, protected by lock
        counter = value + 1

# Deadlock: acquiring the same lock twice in the same coroutine
async def deadlock():
    async with lock:
        async with lock:  # DEADLOCK: lock is not reentrant
            pass

# asyncio.Lock is NOT reentrant — use asyncio.Semaphore for counting locks
semaphore = asyncio.Semaphore(10)  # allow up to 10 concurrent operations

async def controlled_fetch(url: str):
    async with semaphore:  # blocks if 10 fetches are already in progress
        return await http_client.get(url)

Timeout handling

python
import asyncio

# asyncio.wait_for: timeout a single coroutine
async def fetch_with_timeout(url: str, timeout: float = 5.0):
    try:
        return await asyncio.wait_for(http_client.get(url), timeout=timeout)
    except asyncio.TimeoutError:
        raise HTTPException(504, "Upstream service timed out")

# asyncio.timeout context manager (Python 3.11+)
async def fetch_modern(url: str):
    async with asyncio.timeout(5.0):
        return await http_client.get(url)

# Parallel with individual timeouts
async def fetch_all_with_timeout(urls: list[str]):
    tasks = [
        asyncio.create_task(
            asyncio.wait_for(http_client.get(url), timeout=2.0)
        )
        for url in urls
    ]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return [r for r in results if not isinstance(r, Exception)]

Blocking the event loop: common culprits

python
import asyncio
import time
import json

# These block the event loop (DO NOT use in async context without run_in_executor):
# time.sleep(1)             — use await asyncio.sleep(1)
# requests.get(url)         — use aiohttp or httpx async client
# open(file).read()         — use aiofiles
# json.loads(huge_string)   — acceptable for small data, run_in_executor for large

# Detecting event loop blocks: use a watchdog
async def event_loop_watchdog(threshold_ms: float = 100):
    while True:
        start = time.perf_counter()
        await asyncio.sleep(0)  # yield
        elapsed_ms = (time.perf_counter() - start) * 1000
        if elapsed_ms > threshold_ms:
            import logging
            logging.warning(f"Event loop blocked for {elapsed_ms:.1f}ms")
        await asyncio.sleep(0.01)

# Run watchdog as a background task
asyncio.create_task(event_loop_watchdog(threshold_ms=50))

Quiz: test your understanding

Before moving on, answer these in your head (or out loud):

  1. What is the difference between calling my_coro() and asyncio.create_task(my_coro())? What happens in each case?
  2. You use asyncio.gather(a(), b(), c()) and b() raises an exception. What happens to a() and c()? How do you change this behavior?
  3. You have a background task that fails silently. How does Python signal this? What is the correct way to handle exceptions in fire-and-forget tasks?
  4. Why is asyncio.Lock not reentrant? What happens if you try to acquire the same lock twice from the same coroutine?
  5. Your event loop is freezing for 500ms every few seconds. Name three common causes and how you would diagnose which one is happening.

Next up — Part 17: FastAPI Internals. Starlette routing, dependency injection implementation, and lifespan events under the hood.

← PREV
Staff Prep 15: The Python GIL Explained — What It Is, When It Hurts, What Changed
← All Architecture Posts