I feel like there is a gap in my understanding of async IO: is there a benefit to wrapping small functions into coroutines, within the scope of larger coroutines? Is there a benefit to this in signaling the event loop correctly? Does the extent of this benefit depend on whether the wrapped function is IO or CPU-bound?
Example: I have a coroutine, download()
, which:
- Downloads JSON-serialized bytes from an HTTP endpoint via
aiohttp
. - Compresses those bytes via
bz2.compress()
- which is not in itself awaitable - Writes the compressed bytes to S3 via
aioboto3
So parts 1 & 3 use predefined coroutines from those libraries; part 2 does not, by default.
Dumbed-down example:
import bz2
import io
import aiohttp
import aioboto3
async def download(endpoint, bucket_name, key):
async with aiohttp.ClientSession() as session:
async with session.request("GET", endpoint, raise_for_status=True) as resp:
raw = await resp.read() # payload (bytes)
# Yikes - isn't it bad to throw a synchronous call into the middle
# of a coroutine?
comp = bz2.compress(raw)
async with (
aioboto3.session.Session()
.resource('s3')
.Bucket(bucket_name)
) as bucket:
await bucket.upload_fileobj(io.BytesIO(comp), key)
As hinted by the comment above, my understanding has always been that throwing a synchronous function like bz2.compress()
into a coroutine can mess with it. (Even if bz2.compress()
is probably more IO-bound than CPU-bound.)
So, is there generally any benefit to this type of boilerplate?
async def compress(*args, **kwargs):
return bz2.compress(*args, **kwargs)
(And now comp = await compress(raw)
within download()
.)
Wa-la, this is now an awaitable coroutine, because a sole return
is valid in a native coroutine. Is there a case to be made for using this?
Per this answer, I've heard justification for randomly throwing in asyncio.sleep(0)
in a similar manner - just to single back up to the event loop that the calling coroutine wants a break. Is this right?