I'm using newspaper3k
inside Scrapy
parse method. I want to extract links but I don't want to fetch the website again.
Is it possible to use this:
newspaper.build(..)
with plain html
so I can call .articles
than?
I'm using newspaper3k
inside Scrapy
parse method. I want to extract links but I don't want to fetch the website again.
Is it possible to use this:
newspaper.build(..)
with plain html
so I can call .articles
than?
I found this solution:
import httpx
from newspaper import Article
async def get_article(url):
with httpx.AsyncClient() as client:
response = await client.get(url)
article = Article(url)
article.set_html(response.text)
article.parse()