Appearance
Most Python async code handles success well and failure poorly. Cancellation is the spot where this becomes obvious. Here are the two bugs I have now seen in five different production codebases — including one I shipped myself last year. Both are easy to write, hard to test, and easy to fix once you have a name for them.
Bug 1 — Swallowed CancelledError
The first one looks like this. Innocent, even responsible:
python
async def fetch_user(uid: str) -> dict | None:
try:
return await api.get(f"/users/{uid}")
except Exception as e:
log.warning(f"fetch failed: {e}")
return NoneIt is broken. asyncio.CancelledError inherits from BaseException in Python 3.8+, but on older codebases it is Exception and is silently caught. Even on modern Python, if anyone in your stack writes except BaseException: or except CancelledError: and continues, the cancellation request from the framework is dropped on the floor.
The result is that when an outer scope (a TaskGroup, a gather, a request timeout) tries to cancel this task, the task swallows the cancel, logs a warning, returns None, and keeps running. The outer scope sits there waiting for cleanup that has already, silently, finished.
The fix is to re-raise:
python
import asyncio
async def fetch_user(uid: str) -> dict | None:
try:
return await api.get(f"/users/{uid}")
except asyncio.CancelledError:
raise # never swallow this
except Exception as e:
log.warning(f"fetch failed: {e}")
return NoneA linter rule will catch most of these. Ruff has it as ASYNC102 ("blocking-cancel"); add it to your ruff.toml and stop relying on memory:
toml
# ruff.toml
[lint]
select = ["ASYNC", "E", "F", "B"]Bug 2 — Cleanup that suspends
The second one is subtler. You handle CancelledError correctly — you re-raise — but in your finally block you await something:
python
async def with_session():
sess = await pool.acquire()
try:
return await do_work(sess)
finally:
await sess.release() # ← this await runs during cancellationThis seems correct, and most of the time it is. The trap is what happens when the cancellation arrives a second time. If the outer scope cancels twice — say, the framework cancels, you await sess.release(), and then the framework cancels again because cleanup is taking too long — the second cancellation interrupts your finally, and the session is leaked.
The fix is to make cleanup uninterruptible:
python
import asyncio
async def with_session():
sess = await pool.acquire()
try:
return await do_work(sess)
finally:
# shield the cleanup from a second cancellation
await asyncio.shield(sess.release())asyncio.shield runs the inner coroutine to completion even if the outer task is cancelled. The outer task still raises CancelledError; it just waits until release() finishes first.
If release() itself could hang, you also want a timeout, so the shield does not deadlock:
python
async def with_session():
sess = await pool.acquire()
try:
return await do_work(sess)
finally:
try:
await asyncio.wait_for(
asyncio.shield(sess.release()),
timeout=5.0,
)
except asyncio.TimeoutError:
log.error("session release timed out")How I now write it
For any resource I acquire in an async function, I have started using a small helper that bakes both fixes in:
python
import asyncio
from contextlib import asynccontextmanager
@asynccontextmanager
async def safe_resource(acquire, release, *, release_timeout=5.0):
res = await acquire()
try:
yield res
finally:
try:
await asyncio.wait_for(asyncio.shield(release(res)), release_timeout)
except asyncio.TimeoutError:
log.error(f"release timed out for {res!r}")
except asyncio.CancelledError:
raise
except Exception as e:
log.warning(f"release error for {res!r}: {e}")
# usage
async def handler(req):
async with safe_resource(pool.acquire, pool.release) as sess:
return await do_work(sess, req)It is twelve lines, and it has removed a recurring class of bug from our codebase. The cost is one extra layer of indirection per resource; the benefit is that the cancellation contract is now consistent everywhere.
Why this happens
The underlying reason is that asyncio chose a cooperative cancellation model. Cancellation is requested by the framework, but it is granted by the code, at the next await. Code that suppresses or interrupts cancellation can stall the framework's lifecycle. The framework cannot reach in and force you to stop.
This is a sound design — preemption in user-space async code is a nightmare — but it means the contract has to be honored by every coroutine, every layer down. If even one layer cheats, the whole tower wobbles.
For a deeper read on this, Nathaniel J. Smith's Notes on structured concurrency is the foundational essay. Even if you do not use Trio, his framing of cancellation and lifetime is the clearest treatment I have read.