2

My code is working in this way but it's speed is very slow because of for loops, can you help me, to make it work with aiohttp, asyncio?

def field_info(field_link):
    response = requests.get(field_link)
    soup = BeautifulSoup(response.text, 'html.parser')
    races = soup.findAll('header', {'class': 'dc-field-header'})
    tables = soup.findAll('table', {'class': 'dc-field-comp'})

    for i in range(len(races)):
        race_name = races[i].find('h3').text
        race_time = races[i].find('time').text

        names = tables[i].findAll('span', {'class': 'title'})
        trainers = tables[i].findAll('span', {'class': 'trainer'})
        table = []

        for j in range(len(names)):
            table.append({
                'Name': names[j].text,
                'Trainer': trainers[j].text,
            })

        return {
                'RaceName': race_name,
                'RaceTime': race_time,
                'Table': table
                }


links = [link1, link2, link3]
for link in links:
    scraped_info += field_info(link)
sophros
  • 14,672
  • 11
  • 46
  • 75
paskh
  • 35
  • 4
  • 1
    Why? Neither `asyncio` nor `aiohttp` will give your code magic parallelism, nor will they speed up CPU-bound tasks. They're meant for _asynchronous programming_. – ForceBru Jun 10 '19 at 15:14
  • This is unrelated to your question, but instead of using `range(len(names))`, you can use `for name, trainer in zip(names, trainers)` and avoid the index lookups inside the loop. – dirn Jun 10 '19 at 15:57

1 Answers1

3

1) Create a coroutine to make request asynchronously:

import asyncio
import aiohttp


async def get_text(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            return await resp.text()

2) Replace all synchronious requests with awaiting for this coroutine, making outer functions coroutines also:

async def field_info(field_link):              # async - to make outer function coroutine
    text = await get_text(field_link)          # await - to get result from async funcion
    soup = BeautifulSoup(text, 'html.parser')

3) Make outer code to do jobs concurrently using asyncio.gather():

async def main():
    links = [link1, link2, link3]

    scraped_info = asyncio.gather(*[
        field_info(link)
        for link
        in links
    ])  # do multiple field_info coroutines concurrently (parallely)

4) Pass top-level coroutine to asyncio.run():

asyncio.run(main())
Mikhail Gerasimov
  • 36,989
  • 16
  • 116
  • 159
  • Thank you for your step by step answer, it really helped me to understand how this stuff is working. – paskh Jun 11 '19 at 13:35
  • 1
    @paskh you're welcome! You may be also interested in reading [this answer](https://stackoverflow.com/a/33399896/1113207) - it's about how `asyncio` works and how to use it in general. – Mikhail Gerasimov Jun 11 '19 at 13:57