limit number of concurrent requests aiohttp

Question

I'm downloading images using aiohttp, and was wondering if there is a way to limit the number of open requests that haven't finished. This is the code I currently have:

async def get_images(url, session):

    chunk_size = 100

    # Print statement to show when a request is being made. 
    print(f'Making request to {url}')

    async with session.get(url=url) as r:
        with open('path/name.png', 'wb') as file:
            while True:
                chunk = await r.content.read(chunk_size)
                if not chunk:
                    break
                file.write(chunk)

# List of urls to get images from
urls = [...]

conn = aiohttp.TCPConnector(limit=3)
loop = asyncio.get_event_loop()
session = aiohttp.ClientSession(connector=conn, loop=loop)
loop.run_until_complete(asyncio.gather(*(get_images(url, session=session) for url in urls)))

The problem is, I threw a print statement in to show me when each request is being made and it is making almost 21 requests at once, instead of the 3 that I am wanting to limit it to (i.e., once an image is done downloading, it can move on to the next url in the list to get). I'm just wondering what I am doing wrong here.

score 8 · Accepted Answer · answered May 05 '18 at 22:36

Your limit setting works correctly. You made mistake while debugging.

As Mikhail Gerasimov pointed in the comment, you put your print() call in wrong place - it must be inside session.get() context.

In order to be confident that limit is respected, I tested your code against simple logging server - and test shows that the server receives exactly that number of connections that you set in TCPConnector. Here is the test:

import asyncio
import aiohttp
loop = asyncio.get_event_loop()


class SilentServer(asyncio.Protocol):
    def connection_made(self, transport):
        # We will know when the connection is actually made:
        print('SERVER |', transport.get_extra_info('peername'))


async def get_images(url, session):

    chunk_size = 100

    # This log doesn't guarantee that we will connect,
    # session.get() will freeze if you reach TCPConnector limit
    print(f'CLIENT | Making request to {url}')

    async with session.get(url=url) as r:
        while True:
            chunk = await r.content.read(chunk_size)
            if not chunk:
                break

urls = [f'http://127.0.0.1:1337/{x}' for x in range(20)]

conn = aiohttp.TCPConnector(limit=3)
session = aiohttp.ClientSession(connector=conn, loop=loop)


async def test():
    await loop.create_server(SilentServer, '127.0.0.1', 1337)
    await asyncio.gather(*(get_images(url, session=session) for url in urls))

loop.run_until_complete(test())

score 3 · Answer 2 · answered May 05 '18 at 19:59

3

asyncio.Semaphore solves exactly this issue.

In your case it'll be something like this:

semaphore = asyncio.Semaphore(3)


async def get_images(url, session):

    async with semaphore:

        print(f'Making request to {url}')

        # ...

You may also be interested to take a look at this ready-to-run code example that demonstrates how semaphore works.

answered May 05 '18 at 19:59

Mikhail Gerasimov

36,989
16
116
159

but why setting limit in TCP connector doesn't work? – Andrii Maletskyi May 05 '18 at 20:55
@AndriyMaletsky I guess it works, but later down the execution flow: after your print line, somewhere inside `session.get`. I think it's more convenient not to delegate this job to connector, but to use semaphore where you want. – Mikhail Gerasimov May 05 '18 at 21:05

limit number of concurrent requests aiohttp

2 Answers2