17

Is a function like:

async def f(x):
    time.sleep(x)

await f(5)

properly asynchronous/non-blocking?

Is the sleep function provided by asyncio any different?

and finally, is aiorequests a viable asynchronous replacement for requests?

(to my mind it basically wraps main components as asynchronous)

https://github.com/pohmelie/aiorequests/blob/master/aiorequests.py

Kuba Chrabański
  • 625
  • 2
  • 7
  • 17

1 Answers1

29

The provided function is not a correctly written async function because it invokes a blocking call, which is forbidden in asyncio. (A quick hint that there's something wrong with the "coroutine" is that it doesn't contain a single await.) The reason that it is forbidden is that a blocking call such as sleep() will pause the current thread without giving other coroutines a chance to run. In other words, instead of pausing the current coroutine, it will pause the whole event loop, i.e. all coroutines.

In asyncio (and other async frameworks) blocking primitives like time.sleep() are replaced with awaitables like asyncio.sleep(), which suspend the awaiter and resume it when the time is right. Other coroutines and the event loop are not only unaffected by suspension of a coroutine, but that's precisely when they get the chance to run. Suspension and resumption of coroutines is the core of async-await cooperative multitasking.

Asyncio supports running legacy blocking functions in a separate thread, so that they don't block the event loop. This is achieved by calling run_in_executor which will hand off the execution to a thread pool (executor in the parlance of Python's concurrent.futures module) and return an asyncio awaitable:

async def f(x):
    loop = asyncio.get_event_loop()
    # start time.sleep(x) in a separate thread, suspend
    # the current coroutine, and resume when it's done
    await loop.run_in_executor(None, time.sleep, x)

This is the technique used by aiorequests to wrap request's blocking functions. Native asyncio functions like asyncio.sleep() do not use this approach; they directly tell the event loop to suspend them and how to wake them up (source).

run_in_executor is useful and effective for quick wrapping of legacy blocking code, and not much else. It is always inferior to a native async implementation, for several reasons:

  • It doesn't implement cancellation. Unlike threads, asyncio tasks are fully cancelable, but this doesn't extend to run_in_executor, which shares the limitations of threads.

  • It doesn't provide light-weight tasks which may number in tens of thousands and run in parallel. run_in_executor uses a thread pool under the hood, so if you await more functions than the maximum number of workers, some functions will have to wait their turn to even start working. The alternative, to increase the number of workers, will swamp the OS with too many threads. Asyncio allows the number of parallel operations to match what you'd have in a hand-written state machine using poll to listen for events.

  • It is likely incompatible with more complex APIs, such as those that expose user-provided callbacks, iterators, or that provide their own thread-based async functionality.

It is recommended to avoid crutches like aiorequests and dive directly into aiohttp. The API is very similar to that of requests, and it is almost as pleasant to use.

Mossab
  • 818
  • 8
  • 13
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • 2
    I wasn’t even hoping to get such an exhaustive answer, thank you very much. – Kuba Chrabański Aug 04 '19 at 07:18
  • The only thing I still don’t get is why run_in_executor uses threads under the hood? Isn’t it completely violating the idea of asyncio in Python, as because of the GIL, async seems to be an alternative approach to threading, (I would even call it a replacement)? What about context switching overhead and locking? I thought that using asyncio is especially beneficial in Python because it lowers the unnecessary threading overhead in anyway single threaded environment. – Kuba Chrabański Aug 04 '19 at 07:24
  • 1
    @KubaChrabański `run_in_executor` uses threads because threads are _the only way_ to get sync functions to cooperate with an async code base. But **native asyncio code doesn't use run_in_executor.** Libraries such as aiohttp are built using async functions (also known as _coroutines_) which are designed from the ground up to suspend themselves instead of blocking, and then you get the benefits. That's why `run_in_executor` a "crutch" and should be avoided. – user4815162342 Aug 04 '19 at 07:42
  • 2
    Ok got it now, “...which are designed from the ground up” explains everything, thank you – Kuba Chrabański Aug 04 '19 at 07:48
  • Let’s say asyncio.sleep() does not exist, and I want to implement it. I would need to use Python C API, am I right? – Kuba Chrabański Aug 04 '19 at 07:52
  • 1
    @KubaChrabański No, `asyncio.sleep()` is [written in Python](https://github.com/python/cpython/blob/5c72badd06a962fe0018ceb9916f3ae66314ea8e/Lib/asyncio/tasks.py#L623), but it uses the primitives provided by the event loop (also written in Python). The way it works is that it allows the current task to suspend, but only after instructing the event loop to resume it after the specified delay elapses. Either way, it's definitely not some kind of simple wrapper over `time.time()`. – user4815162342 Aug 04 '19 at 07:58
  • 2
    @KubaChrabański If you wish to understand how the whole thing **works**, I warmly recommend [this lecture](https://www.youtube.com/watch?v=MCs5OvhV9S4) by David Beazley where he implements a small but fully functional event loop in front of a live audience. The code uses the older `yield from` syntax, but don't let that put you off, `await` is just a tiny syntactic sugar over it and works in exactly the same way under the hood. – user4815162342 Aug 04 '19 at 08:04
  • Whats the difference if I did this instead `async def async_time(): time.sleep(1)` `async def time(): await async_time()` then `time()` – NiceNAS Jan 12 '21 at 21:38
  • @Renan Can you elaborate - did what? – user4815162342 Jan 12 '21 at 21:39
  • Nvm I was trying to be cheeky – NiceNAS Jan 13 '21 at 18:58
  • Really awesome answer! The asyncio tasks on the same event loop are naturally thread-safe because the event loop is indeed running on a single thread. But with `run_in_executor()`, do we need to do anything explicitly for thread safety? @user4815162342 – smwikipedia May 12 '22 at 11:07
  • 1
    @smwikipedia Not sure what kind of thread safety concerns you? `run_in_executor` is primarily for running _blocking_ code without breaking everything else, and it uses threads as a (necessary) implementation detail. It doesn't care whether the function it invokes creates additional threads or using them internally, it only cares about it completing and returning a value. But perhaps I misunderstand the question. – user4815162342 May 12 '22 at 13:47
  • @user4815162342 Suppose there are 2 legacy sync functions which do some "blocking" IO and then access a same global variable. Originally they ran in a sync mode and the global variable will **never** be accessed concurrently. But after I wrap them in `run_in_executor()`, they will be run on different threads and the global variable **can** be accessed concurrently. That may cause some issue I think. This is different from the native asyncio, where all the tasks belonging to the same event loop is essentially running on the same thread. – smwikipedia May 13 '22 at 02:47
  • “...which are designed from the ground up” -- seems all existing io libs need to be re-worked if we want to leverage the asyncio paradigm... That's exactly what I wondered before. @user4815162342 – smwikipedia May 13 '22 at 02:57
  • 2
    @smwikipedia Re threads, I now see what you're saying. Yes, you should consider functions run by `run_in_executor()` in different tasks the same as running them from different threads in blocking code. Re "from the ground up", yes, but that has largely already happened. Asyncio is no longer a new thing, and most modern network libs are async-aware. – user4815162342 May 13 '22 at 08:16
  • @user4815162342 Thanks. Just double confirm about threading. You mean `run_in_executor()` is also guaranteed to be thread-safe? – smwikipedia May 13 '22 at 08:31
  • 2
    @smwikipedia I've noticed the final question only now. So `run_in_executor` *itself* doesn't need to be thread-safe because it is always run from the event loop thread. The function *passed* to `run_in_executor` has to be thread-safe by definition, since it will be run outside the main thread, but its level of thread safety will depend on what it actually does. (For example, it might never need to do any explicit locking if it, say, just opens a file and reads from it.) – user4815162342 Sep 10 '22 at 16:11