15

The problem (I think)

The contextlib.asynccontextmanager documentation gives this example:

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

It looks to me like this can leak resources. If this code's task is cancelled while this code is on its await release_db_connection(conn) line, the release could be interrupted. The asyncio.CancelledError will propagate up from somewhere within the finally block, preventing subsequent cleanup code from running.

So, in practical terms, if you're implementing a web server that handles requests with a timeout, a timeout firing at the exact wrong time could cause a database connection to leak.

Full runnable example

import asyncio
from contextlib import asynccontextmanager

async def acquire_db_connection():
    await asyncio.sleep(1)
    print("Acquired database connection.")
    return "<fake connection object>"

async def release_db_connection(conn):
    await asyncio.sleep(1)
    print("Released database connection.")

@asynccontextmanager
async def get_connection():
    conn = await acquire_db_connection()
    try:
        yield conn
    finally:
        await release_db_connection(conn)

async def do_stuff_with_connection():
    async with get_connection() as conn:
        await asyncio.sleep(1)
        print("Did stuff with connection.")

async def main():
    task = asyncio.create_task(do_stuff_with_connection())

    # Cancel the task just as the context manager running
    # inside of it is executing its cleanup code.
    await asyncio.sleep(2.5)
    task.cancel()
    try:
        await task
    except asyncio.CancelledError:
        pass

    print("Done.")

asyncio.run(main())

Output on Python 3.7.9:

Acquired database connection.
Did stuff with connection.
Done.

Note that Released database connection is never printed.

My questions

  • This is a problem, right? Intuitively to me, I expect .cancel() to mean "cancel gracefully, cleaning up any resources used along the way." (Otherwise, why would they have implemented cancellation as exception propagation?) But I could be wrong. Maybe, for example, .cancel() is meant to be fast instead of graceful. Is there an authoritative source that clarifies what .cancel() is supposed to do here?
  • If this is indeed a problem, how do I fix it?
Maxpm
  • 24,113
  • 33
  • 111
  • 170
  • 1
    It seems that the very fact that you can catch the `CancelledError` implies that you can still do the cleanup yourself -- but you'd have to cache the equivalent of the `conn` state somewhere accessible to other parts of the code. – Andrew Jaffe Aug 03 '21 at 09:07

2 Answers2

2

Focusing on protecting the cleanup from cancellation is a red herring. There is a multitude of things that can go wrong and the context manager has no way to know

  • which errors can occur, and
  • which errors must be protected against.

It is the responsibility of the resource handling utilities to properly handle errors.

  • If release_db_connection must not be cancelled, it must protect itself against cancellation.
  • If acquire/release must be run as a pair, it must be a single async with context manager. Further protection, e.g. against cancellation, may be involved internally as well.
async def release_db_connection(conn):
    """
    Cancellation safe variant of `release_db_connection`

    Internally protects against cancellation by delaying it until cleanup.
    """
    # cleanup is run in separate task so that it
    # cannot be cancelled from the outside.
    shielded_release = asyncio.create_task(asyncio.sleep(1))
    # Wait for cleanup completion – unlike `asyncio.shield`,
    # delay any cancellation until we are done.
    try:
        await shielded_release
    except asyncio.CancelledError:
        await shielded_release
        # propagate cancellation when we are done
        raise
    finally:
        print("Released database connection.")

Note: Asynchronous cleanup is tricky. For example, a simple asyncio.shield is not sufficient if the event loop does not wait for shielded tasks. Avoid inventing your own protection and rely on the underlying frameworks to do the right thing.


The cancellation of a task is a graceful shutdown that a) still allows async operations and b) may be delayed/suppressed. Coroutines being prepared to handle the CancelledError for cleanup is explicitly allowed.

Task.cancel

The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a try … … except CancelledError … finally block. […] Task.cancel() does not guarantee that the Task will be cancelled, although suppressing cancellation completely is not common and is actively discouraged.

A forceful shutdown is coroutine.close/GeneratorExit. This corresponds to an immediate, synchronous shutdown and forbids suspension via await, async for or async with.

coroutine.close

[…] it raises GeneratorExit at the suspension point, causing the coroutine to immediately clean itself up.

MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • Could you clarify what you mean by "it's the responsibility of the resource handling utilities" and "avoid inventing your own protection"? If you're in the position of writing a context manager to manage a resource's lifetime, you *are* writing the resource handling utilities. – Maxpm Aug 05 '21 at 03:28
  • @Maxpm The ``contextlib`` example is about using existing resource handlers and giving them a nicer coat of paint. In that case, you should trust that the handler is robust unless it is explicitly documented not to be – you simply have no way to know *which* recovery (cancellation, retries, timeouts, ...) is needed otherwise. That also ties into not trying to add protection unless you know how – the naive case of ``asyncio.shield`` leaks on shutdown, the means shown in the answer can suppress cancellations if the task raises exceptions – because they depend on the resource handler's details. – MisterMiyagi Aug 05 '21 at 08:48
  • 1
    @Maxpm Now if we are talking about creating your own, safe resource handler – that's only slightly related to ``contextlib.contextmanager`` (though it is definitely advisable). Most importantly, it is about a) what kind of resource you are handling – e.g. if it can be cleaned up synchronously as well – and what kind of event loop you are using – e.g. ``asyncio`` has a lot of sharp edges. Either way, if the individual acquire/release actions are not safe, an outer context manager has a hard time making them safe – at least not in a manner that would better fit into the acquire/release. – MisterMiyagi Aug 05 '21 at 08:58
  • This approach feels wrong to me: whether or not a coroutine " must not be cancelled" depends on the situation. For example, a file deletion coroutine could be called as a context manager's cleanup (which should not be cancelled) or as part of the main operation of a program (which should be cancellable). Wouldn't your approach force all coroutines that could possibly be used in a cleanup to use this `try: await x; except CancelledError: await x` construct, which makes them unusable in contexts where they _should_ be cancellable? – Arno May 26 '23 at 12:30
  • @Arno There are operations that literally "must not be cancelled" once they have started. These are generally multi-step cleanup procedures, where partial cleanup may leave a system in undefined state (for example, finishing the compression of a journal). If your cleanup *sometimes* is fine to cancel, then its not one that "must not be cancelled" in itself and does not need such protection. If there is *calling* code that requires it not to be cancelled (e.g. delete *multiple* files at once, to avoid partial construction/cleanup to look the same) then the calling code must shield its it. – MisterMiyagi May 26 '23 at 12:46
1

You can protect the task with asyncio.shield to guarantee graceful shutdown of the context manager, I did changes only in main():

async def main():
    task = asyncio.create_task(do_stuff_with_connection())
    # shield context manager from cancellation
    sh_task = asyncio.shield(task)
    # Cancel the task just as the context manager running
    # inside of it is executing its cleanup code.
    await asyncio.sleep(2.5)
    sh_task.cancel()  # cancel shielded task
    try:
        await sh_task
    except asyncio.CancelledError:
        pass

    await asyncio.sleep(5)  # wait till shielded task is done

    print("Done.")
Artiom Kozyrev
  • 3,526
  • 2
  • 13
  • 31
  • But this still doesn't call `release_db_connection(conn)`, does it? – Andrew Jaffe Aug 03 '21 at 12:06
  • @AndrewJaffe I got the following output when I ran the code ```Acquired database connection. Did stuff with connection. Released database connection. Done.``` So it looks like it runs `release_db_connection` – Artiom Kozyrev Aug 03 '21 at 12:17
  • 2
    Oops, I left off the final `await asyncio.sleep(5)` which meant that the code ended before it could run `release_db_connection` (but which is an interesting failure mode, too!). – Andrew Jaffe Aug 03 '21 at 12:21
  • Yes, you have to make loop busy with some work, otherwise it does not wait for `tasks` – Artiom Kozyrev Aug 03 '21 at 12:24
  • In place of `await asyncio.sleep(5)`, you could use `await task` (not `await sh_task`) to make it more deterministic, right? That would also ensure that exceptions from the task propagate properly. – Maxpm Aug 03 '21 at 14:56
  • 1
    @Maxpm I wanted to demonstrate that despite task was cancelled, It still could be gracefully stopped, while app (e.g. some server) can continue working with other tasks. Since this code snippet is not real server, which listens for some connections all the time, I used `asyncio.sleep` to simulate some server work. – Artiom Kozyrev Aug 03 '21 at 15:43