1

The following code works on several other URLs but does not work for a specific URL. Not sure why and how to workaround it? For money.usunew.com it hangs. But for all other URLs that I tried such as usatoday.com it works.

import requests

from bs4 import BeautifulSoup

url = 'https://money.usnews.com' # does NOT work for this URL but works for 'https://www.usatoday.com' 

result = requests.get(url)

src = result.content

soup = BeautifulSoup(src, 'html.parser')

print(soup.prettify())
IoaTzimas
  • 10,538
  • 2
  • 13
  • 30

1 Answers1

0

This is because the website is blocking the spider. You can add timeout to check it out.

result = s.post('https://money.usnews.com', timeout=15)

You got:

requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='money.usnews.com', port=443): Read timed out. (read timeout=15)

Similar question:

How to send cookies in a post request with the Python Requests library?

Frank
  • 1,151
  • 10
  • 22