2

I am trying to download the text from an article that I can browse via web (Safari for example).

The error is:

newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830

Here's the code:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

Like you see I tried the solution in this Stackoverflow answer but didn't work.

Complete error log:

/Users/mona/anaconda3/bin/python /Users/mona/multimodal/newspaper_pg.py
Traceback (most recent call last):
  File "/Users/mona/multimodal/newspaper_pg.py", line 18, in <module>
    page.parse()
  File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 191, in parse
    self.throw_if_not_downloaded_verbose()
  File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 532, in throw_if_not_downloaded_verbose
    (self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830

Process finished with exit code 1

I got my user agent info from this Website: https://developers.whatismybrowser.com/useragents/explore/operating_system_name/macos/

Mona Jalal
  • 34,860
  • 64
  • 239
  • 408

1 Answers1

2

The correct user agent for me is Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0

You can find yours here: https://www.whatismybrowser.com/detect/what-is-my-user-agent

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()



page = Article(url, config=config)


page.download()
page.parse()
print(page.text)
Mona Jalal
  • 34,860
  • 64
  • 239
  • 408