Newspaper3k API Article download() failed with HTTPSConnectionPool port=443 Read timed out. (read timeout=7) on URL

Question

I can see the http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html when browsing in Firefox. However, newspaper3k gives me this error:

Article download() failed with HTTPSConnectionPool(host='www.chicagotribune.com', port=443): Read timed out. (read timeout=7) on URL http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html

My code is:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)


page.download()
page.parse()
print(page.text)

I think something like 'renewIPAddress()' might help but I am not sure how exactly to fit it in this code. https://stackoverflow.com/a/50496768/2414957

Did the answer below solve your read timeout issue? – Life is complex Oct 12 '20 at 15:02 — Life is complex, Oct 12 '20 at 15:02

score 5 · Answer 1 · answered Oct 05 '20 at 18:01

You likely solved this already. Your code works fine, but something at a precise moment in time caused the 'read timed out' to occur. I have found that newspaper connections will occasionally timeout, because it uses the Python module requests. These timeouts are usually linked to source that you're querying. newspaper3k does support a timeout parameter in the Config(), which could help prevent future 'read timed out' issues.

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'

config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 10

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)

page.download()
page.parse()
print(page.text)

Newspaper3k API Article download() failed with HTTPSConnectionPool port=443 Read timed out. (read timeout=7) on URL

1 Answers1