8

I have to send a lot of HTTP requests, once all of them have returned, the program can continue. Sounds like a perfect match for asyncio. A bit naively, I wrapped my calls to requests in an async function and gave them to asyncio. This doesn't work.

After searching online, I found two solutions:

  • use a library like aiohttp, which is made to work with asyncio
  • wrap the blocking code in a call to run_in_executor

To understand this better, I wrote a small benchmark. The server-side is a flask program that waits 0.1 seconds before answering a request.

from flask import Flask
import time

app = Flask(__name__)


@app.route('/')
def hello_world():
    time.sleep(0.1) // heavy calculations here :)
    return 'Hello World!'


if __name__ == '__main__':
    app.run()

The client is my benchmark

import requests
from time import perf_counter, sleep

# this is the baseline, sequential calls to requests.get
start = perf_counter()
for i in range(10):
    r = requests.get("http://127.0.0.1:5000/")
stop = perf_counter()
print(f"synchronous took {stop-start} seconds") # 1.062 secs

# now the naive asyncio version
import asyncio
loop = asyncio.get_event_loop()

async def get_response():
    r = requests.get("http://127.0.0.1:5000/")

start = perf_counter()
loop.run_until_complete(asyncio.gather(*[get_response() for i in range(10)]))
stop = perf_counter()
print(f"asynchronous took {stop-start} seconds") # 1.049 secs

# the fast asyncio version
start = perf_counter()
loop.run_until_complete(asyncio.gather(
    *[loop.run_in_executor(None, requests.get, 'http://127.0.0.1:5000/') for i in range(10)]))
stop = perf_counter()
print(f"asynchronous (executor) took {stop-start} seconds") # 0.122 secs

#finally, aiohttp
import aiohttp

async def get_response(session):
    async with session.get("http://127.0.0.1:5000/") as response:
        return await response.text()

async def main():
    async with aiohttp.ClientSession() as session:
        await get_response(session)

start = perf_counter()
loop.run_until_complete(asyncio.gather(*[main() for i in range(10)]))
stop = perf_counter()
print(f"aiohttp took {stop-start} seconds") # 0.121 secs

So, an intuitive implementation with asyncio doesn't deal with blocking io code. But if you use asyncio correctly, it is just as fast as the special aiohttp framework. The docs for coroutines and tasks don't really mention this. Only if you read up on the loop.run_in_executor(), it says:

# File operations (such as logging) can block the
# event loop: run them in a thread pool.

I was surprised by this behaviour. The purpose of asyncio is to speed up blocking io calls. Why is an additional wrapper, run_in_executor, necessary to do this?

The whole selling point of aiohttp seems to be support for asyncio. But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

lhk
  • 27,458
  • 30
  • 122
  • 201
  • 1
    The purpose of ayncio is not to speed things up in general, it's to reduce latency. Both of your approaches do that, while the executor might require a few more resources. – Klaus D. Nov 12 '18 at 11:03
  • executor is based on threads. `asyncio` using non-blocking socket so it can request many with one thread but `requests` is not – KC. Nov 12 '18 at 11:38

1 Answers1

16

But as far as I can see, the requests module works perfectly - as long as you wrap it in an executor. Is there a reason to avoid wrapping something in an executor ?

Running code in executor means to run it in OS threads.

aiohttp and similar libraries allow to run non-blocking code without OS threads, using coroutines only.

If you don't have much work, difference between OS threads and coroutines is not significant especially comparing to bottleneck - I/O operations. But once you have much work you can notice that OS threads perform relatively worse due to expensively context switching.

For example, when I change your code to time.sleep(0.001) and range(100), my machine shows:

asynchronous (executor) took 0.21461606299999997 seconds
aiohttp took 0.12484742700000007 seconds

And this difference will only increase according to number of requests.

The purpose of asyncio is to speed up blocking io calls.

Nope, purpose of asyncio is to provide convenient way to control execution flow. asyncio allows you to choose how flow works - based on coroutines and OS threads (when you use executor) or on pure coroutines (like aiohttp does).

It's aiohttp's purpose to speed up things and it copes with the task as shown above :)

Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • 4
    Asyncio coroutines are not really green threads, because green threads are stackful. Carrying a full stack allows them to switch at arbitrary places and avoid the [function color](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/) problem, but at the cost of each green thread being much more heavyweight than a coroutine/[fiber](https://en.wikipedia.org/wiki/Fiber_(computer_science)). An example of Python implementation of green threads is the [greenlet](https://pypi.org/project/greenlet/) module and the [gevent](http://www.gevent.org/) event loop based on it. – user4815162342 Nov 12 '18 at 13:26
  • @user4815162342 thanks for clarification! I altered answer. – Mikhail Gerasimov Nov 12 '18 at 13:38
  • @MikhailGerasimov, thanks for the elaboration on aiohttps performance, +1 from me :) I still have some conceptual problems though, currently updating my question – lhk Nov 12 '18 at 13:59
  • I have updated my question. I don't understand the intersection between asyncio and aiohttp. Asyncio has non-blocking coroutines without OS-threads ? That sounds like a huge feature. Is this a part of asyncio ? If yes, why isn't that the default. If not, how is aiohttp based on asyncio (async/await are a language feature and not directly a part of asyncio) ? – lhk Nov 12 '18 at 14:18
  • hmm, I'm reconsidering this. You have answered my question on executors (OS-threads aren't free). It seems to me that my rephrased question is in fact another question. So I'm making up a new one. – lhk Nov 12 '18 at 14:20
  • 1
    @lhk Yes, asyncio has non-blocking coroutines without OS-threads, and it _is_ a huge feature. Aiohttp is based on asyncio because it relies on asyncio's abstractions built on top of the raw async/await. See answers to [this question](https://stackoverflow.com/q/49005651/1600898), particularly [this one](https://stackoverflow.com/a/51177895/1600898), for in-depth coverage of the topic. – user4815162342 Nov 12 '18 at 14:27