7

I'm making a python web scraper script. I should do this using asyncio. So for Async HTTP request I use AioHTTP.
It's ok but when i'm trying to make a non-blocking app (await), the beautifulsoup4 will block application (because beautifulsoup4 dose't support async)

This is what i'm tried.

import asyncio, aiohttp
from bs4 import BeautifulSoup

async def extractLinks(html):
    soup = BeautifulSoup(html, 'html.parser')
    return soup.select(".c-pro-box__title a")

async def getHtml(session, url):
    async with session.get(url) as response:
        return await response.text()

async def loadPage(url):
    async with aiohttp.ClientSession() as session:
        html = await getHtml(session, url)
        links = await extractLinks(html)
        return links

loop = asyncio.get_event_loop()
loop.run_until_complete(loadPage())

The extractLinks() will block program flow.
So is this possible to make it non-blocking? Or is there any library except beautifulsoup4 that support async as well as possible?

Exind
  • 417
  • 1
  • 6
  • 12
  • 3
    Somewhere some function has to be blocking to get things done, you can't escape it. The purpose of async is to reduce waiting for IO, not CPU work. – abdusco Jul 04 '19 at 07:35
  • 3
    `extractLinks` is a "working function" that uses the CPU, it doesn't do anything async by nature (like sending HTTP request or reading from a DB). if you want to can execute it on a new thread, but that's a different issue – Adam.Er8 Jul 04 '19 at 07:36
  • @Adam.Er8 Thanks you. I did't know that async is just for IO and parsing just uses CPU. And your suggest about using thread is very nice. I'll do it. – Exind Jul 04 '19 at 07:45
  • @abdusco thanks you. i did't know that async just for IO – Exind Jul 04 '19 at 07:46
  • I suggest reading this question and the answers to it: https://stackoverflow.com/q/37419572/5052365 – Adam.Er8 Jul 04 '19 at 07:47
  • Does this answer your question? [How and when to use ‘async’ and ‘await’](https://stackoverflow.com/questions/14455293/how-and-when-to-use-async-and-await) – Loïc Faure-Lacroix Feb 27 '20 at 18:14

0 Answers0