How to use asyncio and aiohttp for looping instead of for looping?

Question

My code is working in this way but it's speed is very slow because of for loops, can you help me, to make it work with aiohttp, asyncio?

def field_info(field_link):
    response = requests.get(field_link)
    soup = BeautifulSoup(response.text, 'html.parser')
    races = soup.findAll('header', {'class': 'dc-field-header'})
    tables = soup.findAll('table', {'class': 'dc-field-comp'})

    for i in range(len(races)):
        race_name = races[i].find('h3').text
        race_time = races[i].find('time').text

        names = tables[i].findAll('span', {'class': 'title'})
        trainers = tables[i].findAll('span', {'class': 'trainer'})
        table = []

        for j in range(len(names)):
            table.append({
                'Name': names[j].text,
                'Trainer': trainers[j].text,
            })

        return {
                'RaceName': race_name,
                'RaceTime': race_time,
                'Table': table
                }


links = [link1, link2, link3]
for link in links:
    scraped_info += field_info(link)

Why? Neither `asyncio` nor `aiohttp` will give your code magic parallelism, nor will they speed up CPU-bound tasks. They're meant for _asynchronous programming_. — ForceBru, Jun 10 '19 at 15:14
This is unrelated to your question, but instead of using `range(len(names))`, you can use `for name, trainer in zip(names, trainers)` and avoid the index lookups inside the loop. — dirn, Jun 10 '19 at 15:57

score 3 · Accepted Answer · answered Jun 10 '19 at 16:58

1) Create a coroutine to make request asynchronously:

import asyncio
import aiohttp


async def get_text(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()

2) Replace all synchronious requests with awaiting for this coroutine, making outer functions coroutines also:

async def field_info(field_link):              # async - to make outer function coroutine
    text = await get_text(field_link)          # await - to get result from async funcion
    soup = BeautifulSoup(text, 'html.parser')

3) Make outer code to do jobs concurrently using asyncio.gather():

async def main():
    links = [link1, link2, link3]

    scraped_info = asyncio.gather(*[
        field_info(link)
        for link
        in links
    ])  # do multiple field_info coroutines concurrently (parallely)

4) Pass top-level coroutine to asyncio.run():

asyncio.run(main())

Thank you for your step by step answer, it really helped me to understand how this stuff is working. — paskh, Jun 11 '19 at 13:35
@paskh you're welcome! You may be also interested in reading [this answer](https://stackoverflow.com/a/33399896/1113207) - it's about how `asyncio` works and how to use it in general. — Mikhail Gerasimov, Jun 11 '19 at 13:57

How to use asyncio and aiohttp for looping instead of for looping?

1 Answers1