16

I am writing a tool in Python 3.6 that sends requests to several APIs (with various endpoints) and collects their responses to parse and save them in a database.

The API clients that I use have a synchronous version of requesting a URL, for instance they use

urllib.request.Request('...

Or they use Kenneth Reitz' Requests library.

Since my API calls rely on synchronous versions of requesting a URL, the whole process takes several minutes to complete.

Now I'd like to wrap my API calls in async/await (asyncio). I'm using python 3.6.

All the examples / tutorials that I found want me to change the synchronous URL calls / requests to an async version of it (for instance aiohttp). Since my code relies on API clients that I haven't written (and I can't change) I need to leave that code untouched.

So is there a way to wrap my synchronous requests (blocking code) in async/await to make them run in an event loop?

I'm new to asyncio in Python. This would be a no-brainer in NodeJS. But I can't wrap my head around this in Python.

Update 2023-06-12

Here's how I'd do it in Python 3.9+

import asyncio
import requests

async def main():
    response1 = await asyncio.to_thread(requests.get, 'http://httpbin.org/get')
    response2 = await asyncio.to_thread(requests.get, 'https://api.github.com/events')
    print(response1.text)
    print(response2.text)

asyncio.run(main())
Ugur
  • 1,914
  • 2
  • 25
  • 46
  • 1
    You have synchronous methods that you want to call asynchronously. Therefore you need to write async wrappers for them - any Python async tutorial teaches you how to do this. Here's one. http://stackabuse.com/python-async-await-tutorial/. Whether you are calling functions that do HTTP requests internally or anything else doesn't make any difference at all. – Tomalak Jun 25 '17 at 11:18
  • 4
    I know that example. That one uses an async version of requesting a URL. It uses `client = aiohttp.ClientSession(loop=loop)` and later `async with client.get(url)...` – Ugur Jun 25 '17 at 12:10
  • I'm not sure what you mean. Have you read the entire post or just the lines with `http` on them? – Tomalak Jun 25 '17 at 14:15
  • 5
    I think @Ugur is right. The tutorial does not contain an example that shows how to wrap a synchronous function (one that doesn't return an *awaitable*), so that it can be used with async/await effectively. The example in the tutorial wouldn't work if you replace `aiohttp` with `requests`. – Rotareti Jan 21 '18 at 05:11
  • 1
    Possible duplicate of [How could I use requests in asyncio?](https://stackoverflow.com/questions/22190403/how-could-i-use-requests-in-asyncio) – Rotareti Jan 21 '18 at 05:19

1 Answers1

13

The solution is to wrap your synchronous code in the thread and run it that way. I used that exact system to make my asyncio code run boto3 (note: remove inline type-hints if running < python3.6):

async def get(self, key: str) -> bytes:
    s3 = boto3.client("s3")
    loop = asyncio.get_event_loop()
    try:
        response: typing.Mapping = \
            await loop.run_in_executor(  # type: ignore
                None, functools.partial(
                    s3.get_object,
                    Bucket=self.bucket_name,
                    Key=key))
    except botocore.exceptions.ClientError as e:
        if e.response["Error"]["Code"] == "NoSuchKey":
            raise base.KeyNotFoundException(self, key) from e
        elif e.response["Error"]["Code"] == "AccessDenied":
            raise base.AccessDeniedException(self, key) from e
        else:
            raise
    return response["Body"].read()

Note that this will work because the vast amount of time in the s3.get_object() code is spent in waiting for I/O, and (generally) while waiting for I/O python releases the GIL (the GIL is the reason that generally threads in python is not a good idea).

The first argument None in run_in_executor means that we run in the default executor. This is a threadpool executor, but it may make things more explicit to explicitly assign a threadpool executor there.

Note that, where using pure async I/O you could easily have thousands of connections open concurrently, using a threadpool executor means that each concurrent call to the API needs a separate thread. Once you run out of threads in your pool, the threadpool will not schedule your new call until a thread becomes available. You can obviously raise the number of threads, but this will eat up memory; don't expect to be able to go over a couple of thousand.

Also see the python ThreadPoolExecutor docs for an explanation and some slightly different code on how to wrap your sync call in async code.

Claude
  • 8,806
  • 4
  • 41
  • 56