2

I'm using newspaper3k inside Scrapy parse method. I want to extract links but I don't want to fetch the website again.

Is it possible to use this:

newspaper.build(..)

with plain html so I can call .articles than?

Milano
  • 18,048
  • 37
  • 153
  • 353

1 Answers1

0

I found this solution:

import httpx

from newspaper import Article

async def get_article(url):
    with httpx.AsyncClient() as client:
        response = await client.get(url)

    article = Article(url)
    article.set_html(response.text)
    article.parse()
Dmitrii K
  • 249
  • 2
  • 13