Fetching multiple urls with aiohttp in Python 3.5

Question

Since Python 3.5 introduced async with the syntax recommended in the docs for aiohttp has changed. Now to get a single url they suggest:

import aiohttp
import asyncio

async def fetch(session, url):
    with aiohttp.Timeout(10):
        async with session.get(url) as response:
            return await response.text()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    with aiohttp.ClientSession(loop=loop) as session:
        html = loop.run_until_complete(
            fetch(session, 'http://python.org'))
        print(html)

How can I modify this to fetch a collection of urls instead of just one url?

In the old asyncio examples you would set up a list of tasks such as

    tasks = [
            fetch(session, 'http://cnn.com'),
            fetch(session, 'http://google.com'),
            fetch(session, 'http://twitter.com')
            ]

I tried to combine a list like this with the approach above but failed.

@AndrewSvetlov Wonderful to hear from you. What I mean is I could not understand how to do it. When I define a list of tasks then use `results = loop.run_until_complete(tasks)` I get a runtime error. `async with` is such a new feature with so little literature that it would be super convenient for people learning to use it if the `aiohttp` doc showed an example of grabbing more than one url. The library looks terrific, just needing a bit of hand-holding to get started. Thank you! — Hans Schindler, Mar 09 '16 at 18:22

score 50 · Accepted Answer · edited Jan 19 '20 at 09:32

50

For parallel execution you need an asyncio.Task

I've converted your example to concurrent data fetching from several sources:

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        return await response.text()

async def fetch_all(session, urls):
    tasks = []
    for url in urls:
        task = asyncio.create_task(fetch(session, url))
        tasks.append(task)
    results = await asyncio.gather(*tasks)
    return results

async def main():    
    urls = ['http://cnn.com',
            'http://google.com',
            'http://twitter.com']
    async with aiohttp.ClientSession() as session:
        htmls = await fetch_all(session, urls)
        print(htmls)

if __name__ == '__main__':
    asyncio.run(main())

edited Jan 19 '20 at 09:32

Lord Elrond

13,430
7
40
80

answered Mar 09 '16 at 19:07

Andrew Svetlov

16,730
8
66
69

1

Thanks a million! Accepting your answer, but... 1. there's still a typo with the placement of the parenthesis. I'll edit it if you don't mind. 2. It looks to me like to print the actual result the line `print(html)` is deceiving and you actually need something like `print('\n'.join(list((str(some_task._result) for some_tuple in html for some_task in some_tuple))))`, maybe that could be added to the answer? 3. This seems really useful, I'd recommend adding something like this to readthedocs. Thanks again! :) – Hans Schindler Mar 09 '16 at 23:01
1

Andrew, where can I put a test like `if response.status == 200`? If one url does not exist, the script breaks, and I am not understanding where to check the response within the `async with session.get(url) as response: return await response.text()` – Hans Schindler Mar 10 '16 at 10:32
Thank you to the other person who left a comment before. I have started a [new question](http://stackoverflow.com/questions/35926917/fetching-multiple-urls-with-aiohttp-handling-errors) to clarify this. – Hans Schindler Mar 10 '16 at 20:46
Updated to aiohttp 3.x and python 3.7 usage – Andrew Svetlov Aug 30 '18 at 06:37
Where is loop defined? – Kalimantan Oct 18 '18 at 21:21
@Kalimantan `main()` is just ordinary coroutine. Create a task from it and run in event loop as you would do with any other coroutine. – Petr Javorik Dec 08 '18 at 10:31
Hey, does this preserve the order of the urls? i.e. is `htmls[i]` the response to `urls[i]`? – Moo Dec 24 '18 at 01:24
2

the `asyncio.create_task` part is not necessary. Task is created from the coroutine in 3.7. – n1_ Dec 26 '18 at 10:23
When I ran %timeit for this example and compared it to requests, requests was slightly faster (?) Is this approach optimal? – vgoklani Jan 19 '20 at 05:01
Instead of an `if` conditional for the status, simply do `session.get(url, raise_for_status=True)` – CodeBiker Jun 09 '20 at 19:01

Fetching multiple urls with aiohttp in Python 3.5

1 Answers1

Linked