I'm making a python web scraper script. I should do this using asyncio. So for Async HTTP request I use AioHTTP.
It's ok but when i'm trying to make a non-blocking app (await), the beautifulsoup4 will block application (because beautifulsoup4 dose't support async)
This is what i'm tried.
import asyncio, aiohttp
from bs4 import BeautifulSoup
async def extractLinks(html):
soup = BeautifulSoup(html, 'html.parser')
return soup.select(".c-pro-box__title a")
async def getHtml(session, url):
async with session.get(url) as response:
return await response.text()
async def loadPage(url):
async with aiohttp.ClientSession() as session:
html = await getHtml(session, url)
links = await extractLinks(html)
return links
loop = asyncio.get_event_loop()
loop.run_until_complete(loadPage())
The extractLinks()
will block program flow.
So is this possible to make it non-blocking? Or is there any library except beautifulsoup4 that support async as well as possible?