I am trying to download the text from an article that I can browse via web (Safari for example).
The error is:
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
Here's the code:
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
config = Config()
config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
Like you see I tried the solution in this Stackoverflow answer but didn't work.
Complete error log:
/Users/mona/anaconda3/bin/python /Users/mona/multimodal/newspaper_pg.py
Traceback (most recent call last):
File "/Users/mona/multimodal/newspaper_pg.py", line 18, in <module>
page.parse()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 191, in parse
self.throw_if_not_downloaded_verbose()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 532, in throw_if_not_downloaded_verbose
(self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
Process finished with exit code 1
I got my user agent info from this Website: https://developers.whatismybrowser.com/useragents/explore/operating_system_name/macos/