49

How can I set maximum number of requests per second (limit them) in client side using aiohttp?

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
v18o
  • 1,237
  • 2
  • 15
  • 25
  • 3
    I've wrote a tiny module named `asyncio-throttle` which now is hosted on [GitHub](https://github.com/hallazzang/asyncio-throttle). Take a look at its simple implementation. – hallazzang Oct 16 '17 at 06:31
  • 1
    See https://quentin.pradet.me/blog/how-do-you-rate-limit-calls-with-aiohttp.html for a different implementation than asyncio-throttle specific to aiohttp which correctly limits the number of requests per second instead of just limiting the number of concurrent connections. The use of `async with` in asyncio-throttle is a great idea, by the way! – Quentin Pradet Jan 01 '18 at 09:40

4 Answers4

85

Although it's not exactly a limit on the number of requests per second, note that since v2.0, when using a ClientSession, aiohttp automatically limits the number of simultaneous connections to 100.

You can modify the limit by creating your own TCPConnector and passing it into the ClientSession. For instance, to create a client limited to 50 simultaneous requests:

import aiohttp

connector = aiohttp.TCPConnector(limit=50)
client = aiohttp.ClientSession(connector=connector)

In case it's better suited to your use case, there is also a limit_per_host parameter (which is off by default) that you can pass to limit the number of simultaneous connections to the same "endpoint". Per the docs:

limit_per_host (int) – limit for simultaneous connections to the same endpoint. Endpoints are the same if they are have equal (host, port, is_ssl) triple.

Example usage:

import aiohttp

connector = aiohttp.TCPConnector(limit_per_host=50)
client = aiohttp.ClientSession(connector=connector)
Mark Amery
  • 143,130
  • 81
  • 406
  • 459
  • 3
    @GaryvanderMerwe Yes. The (unanimously upvoted) accepted answer *also* limits the number of concurrent requests rather than the rate, though, so I'm not sure why you only take issue with it on mine. Given the overwhelmingly most common use case for either of these features - to avoid a client completely clobbering some server by overwhelming it with requests - either approach (capping maximum connections versus capping the rate) will work fine. – Mark Amery Jun 12 '17 at 08:41
  • 2
    how `asyncio.Semaphore(5)` would be different from `aiohttp.TCPConnector(limit_per_host=5)`? are they interchangeable? – help-ukraine-now Sep 01 '20 at 12:08
  • How to limit the requests only for a particular host using TCPConnector? – Manoj Kumar S Jan 20 '21 at 09:53
  • 2
    I struggle to see how you **rate**-limit the requests using this solution (limit number of requests **per second**) as per the original question. You can have, e.g., 5 parallel connections, but this does not stop you from hitting the remote more than 5 times per second if the response is fast enough. – pcko1 Jan 31 '22 at 10:49
  • 1
    @pcko1 yeah, you're correct that this doesn't *quite* do what the question asked for - and the same point was made by the (unfortunately now deleted) comment by GaryvanderMerwe that my first comment in this thread is replying to. Hopefully it's close enough to still be useful to some people, though! I've edited the answer to highlight in the first sentence that this doesn't do exactly what was asked for. – Mark Amery Jan 31 '22 at 10:59
32

I found one possible solution here: http://compiletoi.net/fast-scraping-in-python-with-asyncio.html

Doing 3 requests at the same time is cool, doing 5000, however, is not so nice. If you try to do too many requests at the same time, connections might start to get closed, or you might even get banned from the website.

To avoid this, you can use a semaphore. It is a synchronization tool that can be used to limit the number of coroutines that do something at some point. We'll just create the semaphore before creating the loop, passing as an argument the number of simultaneous requests we want to allow:

sem = asyncio.Semaphore(5)

Then, we just replace:

page = yield from get(url, compress=True)

by the same thing, but protected by a semaphore:

with (yield from sem):
    page = yield from get(url, compress=True)

This will ensure that at most 5 requests can be done at the same time.

Community
  • 1
  • 1
v18o
  • 1,237
  • 2
  • 15
  • 25
  • 3
    the answer is technically valid. just adding a few nit comments for readers referring the answer in the future. use `asyncio.BoundedSemaphore(5)` instead of `Semaphore` to prevent accidentally increasing the original limit (https://stackoverflow.com/a/48971158/6687477) Also use `async with sem:`. As per the docs _Deprecated since version 3.7: Acquiring a lock using await lock or yield from lock and/or with statement (with await lock, with (yield from lock)) is deprecated. Use async with lock instead_ (https://docs.python.org/3/library/asyncio-sync.html#asyncio.BoundedSemaphore) – 6harat Apr 03 '19 at 10:19
  • 3
    how `asyncio.Semaphore(5)` would be different from `aiohttp.TCPConnector(limit_per_host=5)`? are they interchangeable? – help-ukraine-now Sep 01 '20 at 12:07
  • 1
    While semaphores can limit the concurrent connections would it be right to say that it can limit queries per second? From my understanding, in a situation where you have a service limit of 5 queries per second (QPS) and you use a 5 semaphore you can still be limited if your queries run at bellow 1 second completion time. – arosa Mar 10 '22 at 18:33
6

This is an example without aiohttp, but you can wrap any async method or aiohttp.request using the Limit decorator

import asyncio
import time


class Limit(object):
    def __init__(self, calls=5, period=1):
        self.calls = calls
        self.period = period
        self.clock = time.monotonic
        self.last_reset = 0
        self.num_calls = 0

    def __call__(self, func):
        async def wrapper(*args, **kwargs):
            if self.num_calls >= self.calls:
                await asyncio.sleep(self.__period_remaining())

            period_remaining = self.__period_remaining()

            if period_remaining <= 0:
                self.num_calls = 0
                self.last_reset = self.clock()

            self.num_calls += 1

            return await func(*args, **kwargs)

        return wrapper

    def __period_remaining(self):
        elapsed = self.clock() - self.last_reset
        return self.period - elapsed


@Limit(calls=5, period=2)
async def test_call(x):
    print(x)


async def worker():
    for x in range(100):
        await test_call(x + 1)


asyncio.run(worker())
Andrew Nodermann
  • 610
  • 8
  • 13
1

Because none of the solution works from the other answers (I've already tried) if the API request limits the time since the end of the request. I'm posting a new one that should work:

class Limiter:
    def __init__(self, calls_limit: int = 5, period: int = 1):
        self.calls_limit = calls_limit
        self.period = period
        self.semaphore = asyncio.Semaphore(calls_limit)
        self.requests_finish_time = []

    async def sleep(self):
        if len(self.requests_finish_time) >= self.calls_limit:
            sleep_before = self.requests_finish_time.pop(0)
            if sleep_before >= time.monotonic():
                await asyncio.sleep(sleep_before - time.monotonic())

    def __call__(self, func):
        async def wrapper(*args, **kwargs):

            async with self.semaphore:
                await self.sleep()
                res = await func(*args, **kwargs)
                self.requests_finish_time.append(time.monotonic() + self.period)

            return res

        return wrapper

Usage:

@Limiter(calls_limit=5, period=1)
async def api_call():
    ...


async def main():
    tasks = [asyncio.create_task(api_call(url)) for url in urls]
    asyncio.gather(*tasks)


if __name__ == '__main__':
    loop = asyncio.get_event_loop_policy().get_event_loop()
    loop.run_until_complete(main())
SimfikDuke
  • 943
  • 6
  • 21