0

I am trying to scrape a website, but am receiving the following error:

('Connection aborted.', OSError("(60, 'ETIMEDOUT')",))

I tried to change the timeout = None within response.get(), but I still received the error.

How should I handle the above error given the below code? For some reason, this error only appears when I try and scrape that url, all others have worked fine for me.

The below code was suggested based off of this question.

# Import packages
import requests
from bs4 import BeautifulSoup 

#Input URL
url = "https://www...."

# Try requests.get()
try:    
    r = requests.get(url)
except requests.ConnectionError as e:
    print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
    print(str(e))
except requests.Timeout as e:
    print("OOPS!! Timeout Error")
    print(str(e))
except requests.RequestException as e:
    print("OOPS!! General Error")
    print(str(e))
except KeyboardInterrupt:
    print("Someone closed the program")

I also tried a retry mechanism, but to no luck, as suggested here:

for i in range(0,10):
    while True:
        try:
            r = requests.get(url)
        except requests.ConnectionError as e:
            print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
            print(str(e))
            continue
        break
santorch
  • 151
  • 1
  • 14
  • 1
    If normal browsing works ok, I would suggest looking at network inspector and see, what headers are sent. Maybe you need to set them in `requests` too. – Andrej Kesely Jul 22 '18 at 18:32
  • 1
    What's the website URL? – ggorlen Jul 22 '18 at 18:41
  • Normal browsing works, it was the headers not being set. Thanks! How exactly do headers work? @AndrejKesely – santorch Jul 22 '18 at 19:11
  • 1
    @steich Headers are part of HTTP protocol (https://en.wikipedia.org/wiki/List_of_HTTP_header_fields), they are sent with each request and response. With `requests`, you just supply them in the form of `requests.get(url, headers={...})` – Andrej Kesely Jul 22 '18 at 19:16

0 Answers0