199

I want to do parallel http request tasks in asyncio, but I find that python-requests would block the event loop of asyncio. I've found aiohttp but it couldn't provide the service of http request using a http proxy.

So I want to know if there's a way to do asynchronous http requests with the help of asyncio.

Alexey Shrub
  • 1,216
  • 13
  • 22
flyer
  • 9,280
  • 11
  • 46
  • 62
  • 1
    If you are just sending requests you could use `subprocess` to parallel your your code. – WeaselFox Mar 05 '14 at 06:43
  • 1
    This method seems not elegant…… – flyer Mar 05 '14 at 07:56
  • 1
    There is now an asyncio port of requests. http://github.com/rdbhost/yieldfromRequests – Rdbhost Mar 23 '15 at 15:21
  • 3
    This question is also useful for cases where something indirectly relies on `requests` (like [`google-auth`](https://google-auth.readthedocs.io/en/latest/index.html)) and can't be trivially rewritten to use `aiohttp`. – Alex Peters Apr 17 '21 at 12:12

7 Answers7

234

To use requests (or any other blocking libraries) with asyncio, you can use BaseEventLoop.run_in_executor to run a function in another thread and yield from it to get the result. For example:

import asyncio
import requests

@asyncio.coroutine
def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = yield from future1
    response2 = yield from future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

This will get both responses in parallel.

With python 3.5 you can use the new await/async syntax:

import asyncio
import requests

async def main():
    loop = asyncio.get_event_loop()
    future1 = loop.run_in_executor(None, requests.get, 'http://www.google.com')
    future2 = loop.run_in_executor(None, requests.get, 'http://www.google.co.uk')
    response1 = await future1
    response2 = await future2
    print(response1.text)
    print(response2.text)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

See PEP0492 for more.

alanc10n
  • 4,897
  • 7
  • 36
  • 41
christian
  • 2,533
  • 1
  • 14
  • 19
  • I tried but got the exception `SyntaxError: 'yield' outside function` – flyer Mar 15 '14 at 08:01
  • 1
    You can't use 'yield' outside of a function because the 'yield' keyword will convert a function into a generator (so it needs to be done inside a function). I'll update the example to be more complete. – christian Mar 15 '14 at 08:54
  • 10
    Can you explain how exactly this works? I don't understand how this doesn't block. – Scott Coates Mar 26 '14 at 06:12
  • 3
    @scoarescoare According to the docs, run_in_executor() will use an Executor (by default a [ThreadPoolExecutor](http://docs.python.org/3.4/library/concurrent.futures.html#concurrent.futures.ThreadPoolExecutor)) to run the methods in different threads (or subprocess if specified) and wait for the result. The advantage run_in_executor() has over just using an Executor is that it integrates nicely with asyncio. – christian Mar 26 '14 at 11:22
  • 54
    @christian but if its running concurrently in another thread, isn't that defeating the point of asyncio? – Scott Coates Mar 26 '14 at 16:02
  • 1
    @scoarescoare Not really, if you do it right. In this case, you're simply firing a call off and getting its return value. While run_in_executor() is 'blocking' (for instance, in a similar way to how asyncio's Streams will block, or how a socket might block), control will be yielded to another coroutine waiting in asyncio's event loop. – christian Mar 26 '14 at 17:38
  • 8
    @christian Yeah, the part about it firing a call off and resuming execution makes sense. But if I understand correctly, `requests.get` will be executing in another thread. I believe one of the big pros of asyncio is the idea of keeping things single-threaded: not having to deal with shared memory, locking, etc. I think my confusion lies in the fact that your example uses both asyncio and concurrent.futures module. – Scott Coates Mar 26 '14 at 18:13
  • 24
    @scoarescoare That's where the 'if you do it right' part comes in - the method you run in the executor should be self-contained ((mostly) like requests.get in the above example). That way you don't have to deal with shared memory, locking, etc., and the complex parts of your program are still single threaded thanks to asyncio. – christian Mar 27 '14 at 13:17
  • 1
    @christian ok thanks for clearing it up! Really solidified it for me. Good point about _Executor is that it integrates nicely with asyncio._ It took me a while to realize these are completely different libraries. – Scott Coates Mar 27 '14 at 13:22
  • 5
    @scoarescoare The main use case is for integrating with IO libraries that don't have support for asyncio. For instance, I'm doing some work with a truly ancient SOAP interface, and I'm using the suds-jurko library as the "least bad" solution. I'm trying to integrate it with an asyncio server, so I'm using run_in_executor to make the blocking suds calls in a way that *looks* asynchronous. – Lucretiel Apr 06 '15 at 18:53
  • 13
    Really cool that this works and so is so easy for legacy stuff, but should be emphasised this uses an OS threadpool and so doesn't scale up as a true asyncio oriented lib like aiohttp does – jsalter Jan 22 '16 at 18:35
  • Note that you may want to avoid using the `hooks=` keyword when calling Requests like that, as hook functions will likely break the *self-contained* requirement, depending on what you do in there. – blubberdiblub May 06 '16 at 02:23
  • 1
    why does the line `loop = asyncio.get_event_loop()` appears twice? – ccpizza Jan 18 '18 at 18:48
  • 1
    @ccpizza It doesn't need to appear twice. You can pass the module level "loop" into main, if you want (assuming you change the definition of "main" to take an argument). – christian Jan 23 '18 at 10:44
  • 4
    `run_in_executor` doesn't allow to pass kwargs for target callback func (like `loop.run_in_executor(None, requests.get, 'http://www.google.com', auth=blah) # fails`, this can be achieved with lambda or functools.partial as a proxy: `loop.run_in_executor(None, lambda: requests.get('http://www.google.com', auth=blah))`. see https://www.python.org/dev/peps/pep-3156/#callback-style – juggernaut Apr 15 '19 at 13:53
  • Why does `main()` need to be in event loop when the `main()` body already running event loop? – James Lin Jun 03 '21 at 03:34
  • `main()` isn't running an event loop. It _gets_ the event loop in order to run `requests.get()` asynchronously. If a function wants to `await` an async function, it needs to be running in the event loop. – christian Jun 03 '21 at 08:52
  • @christian Why can't I simply write an async function `do_get()` that calls `requests.get()` and await that function? As far as I understand, the program will go do something else while it waits for `do_get()` to finish, and the only thing `do_get()` does is make the requests call. Won't that be async then? – bluppfisk Oct 26 '22 at 22:04
  • 2
    @bluppfisk The problem is that `requests.get()` isn't async, so when `do_get()` would call `requests.get()`, the function would block. The event loop will only do something else when a function yields control (using `await`) which `requests.get()` doesn't. And since it's blocking, the only way to run something else concurrently is in another thread/process, which is what `run_in_executor()` does. – christian Nov 02 '22 at 11:09
111

aiohttp can be used with HTTP proxy already:

import asyncio
import aiohttp


@asyncio.coroutine
def do_request():
    proxy_url = 'http://localhost:8118'  # your proxy address
    response = yield from aiohttp.request(
        'GET', 'http://google.com',
        proxy=proxy_url,
    )
    return response

loop = asyncio.get_event_loop()
loop.run_until_complete(do_request())
Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69
mindmaster
  • 1,705
  • 3
  • 16
  • 18
91

The answers above are still using the old Python 3.4 style coroutines. Here is what you would write if you got Python 3.5+.

aiohttp supports http proxy now

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

There is also the httpx library, which is a drop-in replacement for requests with async/await support. However, httpx is somewhat slower than aiohttp.

Another option is curl_cffi, which has the ability to impersonate browsers' ja3 and http2 fingerprints.

ospider
  • 9,334
  • 3
  • 46
  • 46
  • 2
    could you elaborate with more urls? It does not make sense to have only one url when the question is about parallel http request. – anonymous May 16 '18 at 03:15
  • Legend. Thank you! Works great – Adam Jun 04 '19 at 15:41
  • @ospider How this code can be modified to deliver say 10k URLs using 100 requests in parallel? The idea is to use all 100 slots simultaneously, not to wait for 100 to be delivered in order to start next 100. – Antoan Milkov Jun 09 '19 at 09:35
  • @AntoanMilkov That's a different question that can not be answered in the comment area. – ospider Jun 10 '19 at 02:06
  • @ospider You are right, here is the question: https://stackoverflow.com/questions/56523043/using-python-3-7-to-make-100k-api-calls-making-100-in-parallel-using-asyncio – Antoan Milkov Jun 10 '19 at 08:39
14

Requests does not currently support asyncio and there are no plans to provide such support. It's likely that you could implement a custom "Transport Adapter" (as discussed here) that knows how to use asyncio.

If I find myself with some time it's something I might actually look into, but I can't promise anything.

Lukasa
  • 14,599
  • 4
  • 32
  • 34
12

There is a good case of async/await loops and threading in an article by Pimin Konstantin Kefaloukos Easy parallel HTTP requests with Python and asyncio:

To minimize the total completion time, we could increase the size of the thread pool to match the number of requests we have to make. Luckily, this is easy to do as we will see next. The code listing below is an example of how to make twenty asynchronous HTTP requests with a thread pool of twenty worker threads:

# Example 3: asynchronous requests with larger thread pool
import asyncio
import concurrent.futures
import requests

async def main():

    with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:

        loop = asyncio.get_event_loop()
        futures = [
            loop.run_in_executor(
                executor, 
                requests.get, 
                'http://example.org/'
            )
            for i in range(20)
        ]
        for response in await asyncio.gather(*futures):
            pass


loop = asyncio.get_event_loop()
loop.run_until_complete(main())
franchb
  • 1,174
  • 4
  • 19
  • 42
  • 2
    problem with this is that if I need to run 10000 requests with chunks of 20 executors, I have to wait for all 20 executors to finish in order to start with the next 20, right? I cannot do for `for i in range(10000)` because one requests might fail or timeout, right? – Sanandrea Jun 19 '18 at 08:33
  • 2
    Can you pls explain why do you need asyncio when you can do the same just using ThreadPoolExecutor? – Asaf Pinhassi Aug 05 '19 at 05:53
  • @lya Rusin Based on what, do we set the number of max_workers? Does it have to do with number of CPUs and threads? – alt-f4 Jun 04 '20 at 05:32
  • @AsafPinhassi if the rest of your script/program/service is asyncio, you'll want to use it "all the way". you'd probably be better off using aiohttp (or some other lib that supports asyncio) – Guy Dec 01 '20 at 23:43
  • 2
    @alt-f4 it actually does not matter how many CPU you have. The point of delegating this work to a thread (and the whole point of asyncio) is for IO bound operations. The thread will simply by idle ("waiting") for the response to retrieved from the socket. asyncio enables to actually handle many concurrent (not parallel!) requests with no threads at all (well, just one). However, `requests` does not support asyncio so you need to create threads to get concurrency. – Guy Dec 01 '20 at 23:46
8

Considering that aiohttp is fully featured web framework, I’d suggest to use something more light weighted like httpx (https://www.python-httpx.org/) which supports async requests. It has almost identical api to requests:

>>> async with httpx.AsyncClient() as client:
...     r = await client.get('https://www.example.com/')
...
>>> r
<Response [200 OK]>
zhukovgreen
  • 1,551
  • 16
  • 26
2

DISCLAMER: Following code creates different threads for each function.

This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.

To make any function non blocking, simply copy the decorator and decorate any function with a callback function as parameter. The callback function will receive the data returned from the function.

import asyncio
import requests


def run_async(callback):
    def inner(func):
        def wrapper(*args, **kwargs):
            def __exec():
                out = func(*args, **kwargs)
                callback(out)
                return out

            return asyncio.get_event_loop().run_in_executor(None, __exec)

        return wrapper

    return inner


def _callback(*args):
    print(args)


# Must provide a callback function, callback func will be executed after the func completes execution !!
@run_async(_callback)
def get(url):
    return requests.get(url)


get("https://google.com")
print("Non blocking code ran !!")
vaskrneup
  • 101
  • 5