1

I'm working out of a Jupyter Notebook and having an issue with newspaper unable to pull down anything from newsweek. I can get it running on Goose, but I wanted to have a backup in case Goose ever failed.

I have tried other websites like Fox, Yahoo, and CNN, all those work fine. So NewsWeek is an isolated issue.

from newspaper import Article
url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod- 
calling-daughter-trump-press-secretary-sarah-sanders-1444184'
article = Article(url)
article.download()
article.html
article.parse()
article.text

Article `download()` failed with 403 Client Error: Forbidden for url: 
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter- 
trump-press-secretary-sarah-sanders-1444184 on URL 
https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter- 
trump-press-secretary-sarah-sanders-1444184
M4cJunk13
  • 419
  • 8
  • 22

1 Answers1

1

You likely solved this issue already, but it is directly related to not passing a user agent when you request the article with Newspaper.

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'

config = Config()
config.browser_user_agent = user_agent

url = 'https://www.newsweek.com/mike-huckabee-blasts-cnns-axelrod-calling-daughter-trump-press-secretary-sarah-sanders-1444184'

article = Article(url, config=config)
article.download()
article.html
article.parse()
article.text
Life is complex
  • 15,374
  • 5
  • 29
  • 58