4

What is the best approach to deliver say 100k API calls using asyncio async/await with Python 3.7+ The idea is to use 100 tasks in parallel all the time?

What should be avoided is:
1. To start working on all 100k tasks
2. To wait for all 100 parallel tasks to finish so new batch of 100 is scheduled.

This example illustrates the first approach, that is not what is needed.

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
Antoan Milkov
  • 2,152
  • 17
  • 30
  • Pretty old answer but: https://hackernoon.com/how-to-run-asynchronous-web-requests-in-parallel-with-python-3-5-without-aiohttp-264dc0f8546 I would suggest setting your max threads to whatever (in your case 100) and use something like: – E.Serra Jun 10 '19 at 08:44

1 Answers1

8

Use semaphore. Semaphores are used to limit concurrent actions. Python's asyncio comes with its own async version of semaphore.

import aiohttp
import asyncio

async def fetch(session, url, sema):
    async with sema, session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me',
            'other urls...'
        ]
    tasks = []
    sema = asyncio.BoundedSemaphore(value=100)
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url, sema))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())
ospider
  • 9,334
  • 3
  • 46
  • 46