23

Since Python 3.5 introduced async with the syntax recommended in the docs for aiohttp has changed. Now to get a single url they suggest:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, 'http://python.org'))
        print(html)

How can I modify this to fetch a collection of urls instead of just one url?

In the old asyncio examples you would set up a list of tasks such as

    tasks = [
            fetch(session, 'http://cnn.com'),
            fetch(session, 'http://google.com'),
            fetch(session, 'http://twitter.com')
            ]

I tried to combine a list like this with the approach above but failed.

Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69
Hans Schindler
  • 1,787
  • 4
  • 16
  • 25
  • Could you explain what is your fail? – Andrew Svetlov Mar 09 '16 at 17:31
  • @AndrewSvetlov Wonderful to hear from you. What I mean is I could not understand how to do it. When I define a list of tasks then use `results = loop.run_until_complete(tasks)` I get a runtime error. `async with` is such a new feature with so little literature that it would be super convenient for people learning to use it if the `aiohttp` doc showed an example of grabbing more than one url. The library looks terrific, just needing a bit of hand-holding to get started. Thank you! – Hans Schindler Mar 09 '16 at 18:22

1 Answers1

50

For parallel execution you need an asyncio.Task

I've converted your example to concurrent data fetching from several sources:

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        return await response.text()

async def fetch_all(session, urls):
    tasks = []
    for url in urls:
        task = asyncio.create_task(fetch(session, url))
        tasks.append(task)
    results = await asyncio.gather(*tasks)
    return results

async def main():    
    urls = ['http://cnn.com',
            'http://google.com',
            'http://twitter.com']
    async with aiohttp.ClientSession() as session:
        htmls = await fetch_all(session, urls)
        print(htmls)

if __name__ == '__main__':
    asyncio.run(main())
Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
Andrew Svetlov
  • 16,730
  • 8
  • 66
  • 69
  • 1
    Thanks a million! Accepting your answer, but... 1. there's still a typo with the placement of the parenthesis. I'll edit it if you don't mind. 2. It looks to me like to print the actual result the line `print(html)` is deceiving and you actually need something like `print('\n'.join(list((str(some_task._result) for some_tuple in html for some_task in some_tuple))))`, maybe that could be added to the answer? 3. This seems really useful, I'd recommend adding something like this to readthedocs. Thanks again! :) – Hans Schindler Mar 09 '16 at 23:01
  • 1
    Andrew, where can I put a test like `if response.status == 200`? If one url does not exist, the script breaks, and I am not understanding where to check the response within the `async with session.get(url) as response: return await response.text()` – Hans Schindler Mar 10 '16 at 10:32
  • Thank you to the other person who left a comment before. I have started a [new question](http://stackoverflow.com/questions/35926917/fetching-multiple-urls-with-aiohttp-handling-errors) to clarify this. – Hans Schindler Mar 10 '16 at 20:46
  • Updated to aiohttp 3.x and python 3.7 usage – Andrew Svetlov Aug 30 '18 at 06:37
  • Where is loop defined? – Kalimantan Oct 18 '18 at 21:21
  • @Kalimantan `main()` is just ordinary coroutine. Create a task from it and run in event loop as you would do with any other coroutine. – Petr Javorik Dec 08 '18 at 10:31
  • Hey, does this preserve the order of the urls? i.e. is `htmls[i]` the response to `urls[i]`? – Moo Dec 24 '18 at 01:24
  • 2
    the `asyncio.create_task` part is not necessary. Task is created from the coroutine in 3.7. – n1_ Dec 26 '18 at 10:23
  • When I ran %timeit for this example and compared it to requests, requests was slightly faster (?) Is this approach optimal? – vgoklani Jan 19 '20 at 05:01
  • Instead of an `if` conditional for the status, simply do `session.get(url, raise_for_status=True)` – CodeBiker Jun 09 '20 at 19:01