10

I'm downloading images using aiohttp, and was wondering if there is a way to limit the number of open requests that haven't finished. This is the code I currently have:

async def get_images(url, session):

    chunk_size = 100

    # Print statement to show when a request is being made. 
    print(f'Making request to {url}')

    async with session.get(url=url) as r:
        with open('path/name.png', 'wb') as file:
            while True:
                chunk = await r.content.read(chunk_size)
                if not chunk:
                    break
                file.write(chunk)

# List of urls to get images from
urls = [...]

conn = aiohttp.TCPConnector(limit=3)
loop = asyncio.get_event_loop()
session = aiohttp.ClientSession(connector=conn, loop=loop)
loop.run_until_complete(asyncio.gather(*(get_images(url, session=session) for url in urls)))

The problem is, I threw a print statement in to show me when each request is being made and it is making almost 21 requests at once, instead of the 3 that I am wanting to limit it to (i.e., once an image is done downloading, it can move on to the next url in the list to get). I'm just wondering what I am doing wrong here.

Jasonca1
  • 4,848
  • 6
  • 25
  • 42

2 Answers2

8

Your limit setting works correctly. You made mistake while debugging.

As Mikhail Gerasimov pointed in the comment, you put your print() call in wrong place - it must be inside session.get() context.

In order to be confident that limit is respected, I tested your code against simple logging server - and test shows that the server receives exactly that number of connections that you set in TCPConnector. Here is the test:

import asyncio
import aiohttp
loop = asyncio.get_event_loop()


class SilentServer(asyncio.Protocol):
    def connection_made(self, transport):
        # We will know when the connection is actually made:
        print('SERVER |', transport.get_extra_info('peername'))


async def get_images(url, session):

    chunk_size = 100

    # This log doesn't guarantee that we will connect,
    # session.get() will freeze if you reach TCPConnector limit
    print(f'CLIENT | Making request to {url}')

    async with session.get(url=url) as r:
        while True:
            chunk = await r.content.read(chunk_size)
            if not chunk:
                break

urls = [f'http://127.0.0.1:1337/{x}' for x in range(20)]

conn = aiohttp.TCPConnector(limit=3)
session = aiohttp.ClientSession(connector=conn, loop=loop)


async def test():
    await loop.create_server(SilentServer, '127.0.0.1', 1337)
    await asyncio.gather(*(get_images(url, session=session) for url in urls))

loop.run_until_complete(test())
Andrii Maletskyi
  • 2,152
  • 1
  • 15
  • 24
3

asyncio.Semaphore solves exactly this issue.

In your case it'll be something like this:

semaphore = asyncio.Semaphore(3)


async def get_images(url, session):

    async with semaphore:

        print(f'Making request to {url}')

        # ...

You may also be interested to take a look at this ready-to-run code example that demonstrates how semaphore works.

Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • but why setting limit in TCP connector doesn't work? – Andrii Maletskyi May 05 '18 at 20:55
  • @AndriyMaletsky I guess it works, but later down the execution flow: after your print line, somewhere inside `session.get`. I think it's more convenient not to delegate this job to connector, but to use semaphore where you want. – Mikhail Gerasimov May 05 '18 at 21:05