The two asyncio cancellation bugs every Python service has

Most Python async code handles success well and failure poorly. Cancellation is the spot where this becomes obvious. Here are the two bugs I have now seen in five different production codebases — including one I shipped myself last year. Both are easy to write, hard to test, and easy to fix once you have a name for them.

Bug 1 — Swallowed CancelledError

The first one looks like this. Innocent, even responsible:

python

async def fetch_user(uid: str) -> dict | None:
    try:
        return await api.get(f"/users/{uid}")
    except Exception as e:
        log.warning(f"fetch failed: {e}")
        return None

It is broken. asyncio.CancelledError inherits from BaseException in Python 3.8+, but on older codebases it is Exception and is silently caught. Even on modern Python, if anyone in your stack writes except BaseException: or except CancelledError: and continues, the cancellation request from the framework is dropped on the floor.

The result is that when an outer scope (a TaskGroup, a gather, a request timeout) tries to cancel this task, the task swallows the cancel, logs a warning, returns None, and keeps running. The outer scope sits there waiting for cleanup that has already, silently, finished.

The fix is to re-raise:

python

import asyncio

async def fetch_user(uid: str) -> dict | None:
    try:
        return await api.get(f"/users/{uid}")
    except asyncio.CancelledError:
        raise                                # never swallow this
    except Exception as e:
        log.warning(f"fetch failed: {e}")
        return None

A linter rule will catch most of these. Ruff has it as ASYNC102 ("blocking-cancel"); add it to your ruff.toml and stop relying on memory:

toml

# ruff.toml
[lint]
select = ["ASYNC", "E", "F", "B"]

Bug 2 — Cleanup that suspends

The second one is subtler. You handle CancelledError correctly — you re-raise — but in your finally block you await something:

python

async def with_session():
    sess = await pool.acquire()
    try:
        return await do_work(sess)
    finally:
        await sess.release()        # ← this await runs during cancellation

This seems correct, and most of the time it is. The trap is what happens when the cancellation arrives a second time. If the outer scope cancels twice — say, the framework cancels, you await sess.release(), and then the framework cancels again because cleanup is taking too long — the second cancellation interrupts your finally, and the session is leaked.

The fix is to make cleanup uninterruptible:

python

import asyncio

async def with_session():
    sess = await pool.acquire()
    try:
        return await do_work(sess)
    finally:
        # shield the cleanup from a second cancellation
        await asyncio.shield(sess.release())

asyncio.shield runs the inner coroutine to completion even if the outer task is cancelled. The outer task still raises CancelledError; it just waits until release() finishes first.

If release() itself could hang, you also want a timeout, so the shield does not deadlock:

python

async def with_session():
    sess = await pool.acquire()
    try:
        return await do_work(sess)
    finally:
        try:
            await asyncio.wait_for(
                asyncio.shield(sess.release()),
                timeout=5.0,
            )
        except asyncio.TimeoutError:
            log.error("session release timed out")

How I now write it

For any resource I acquire in an async function, I have started using a small helper that bakes both fixes in:

python

import asyncio
from contextlib import asynccontextmanager

@asynccontextmanager
async def safe_resource(acquire, release, *, release_timeout=5.0):
    res = await acquire()
    try:
        yield res
    finally:
        try:
            await asyncio.wait_for(asyncio.shield(release(res)), release_timeout)
        except asyncio.TimeoutError:
            log.error(f"release timed out for {res!r}")
        except asyncio.CancelledError:
            raise
        except Exception as e:
            log.warning(f"release error for {res!r}: {e}")

# usage
async def handler(req):
    async with safe_resource(pool.acquire, pool.release) as sess:
        return await do_work(sess, req)

It is twelve lines, and it has removed a recurring class of bug from our codebase. The cost is one extra layer of indirection per resource; the benefit is that the cancellation contract is now consistent everywhere.

Why this happens

The underlying reason is that asyncio chose a cooperative cancellation model. Cancellation is requested by the framework, but it is granted by the code, at the next await. Code that suppresses or interrupts cancellation can stall the framework's lifecycle. The framework cannot reach in and force you to stop.

This is a sound design — preemption in user-space async code is a nightmare — but it means the contract has to be honored by every coroutine, every layer down. If even one layer cheats, the whole tower wobbles.

For a deeper read on this, Nathaniel J. Smith's Notes on structured concurrency is the foundational essay. Even if you do not use Trio, his framing of cancellation and lifetime is the clearest treatment I have read.

The two asyncio cancellation bugs every Python service has

Bug 1 — Swallowed CancelledError ​

Bug 2 — Cleanup that suspends ​

How I now write it ​

Why this happens ​

Read next

Bug 1 — Swallowed CancelledError

Bug 2 — Cleanup that suspends

How I now write it

Why this happens