0

I have a list of around 2,500 sites in which I want to check the status code. Here is the code example:

from bs4 import BeautifulSoup
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
master = "The URL"
req = requests.get(master)
req = req.text

with open('Url.txt', 'w') as writefile:
    writefile.write(req)
with requests.Session() as se:
    se.headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
        "Accept-Encoding": "gzip, deflate",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en"
    }
file = open("Url.txt", "r")

test_sites = []

for line in file:
  stripped_line = line.strip()
  test_sites.append(stripped_line)



for site in test_sites:
    #get page soure
    response = se.get(site, verify=False, allow_redirects=False)
    try: 
        response.raise_for_status()
        outputVariable = f"""{site} : {response}\n"""
        # Save file
        with open('UrlOutput.txt', 'a') as f:
          f.write(outputVariable)
        print(site, response)
        #print(response.text)
    except requests.exceptions.HTTPError as e: 
        print(e)

I've tried every exception I could find on urllib3 documentation as well as requests.exceptions documentation nothing worked. I still kept receiving this error.

gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
MaxRetryError: HTTPSConnectionPool(host='REDACTEDSITE', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='REDACTEDSITE', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known'))

I removed the site url from the error code. But know this. The site IS DOWN. There's a DNS issue. And stuff like this is exactly what I'm trying to catch. How can I catch this error instead of it stopping the program?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • 1
    Did you add `except urllib3.exceptions.NewConnectionError`? or `except requests.exceptions.ConnectionError`? – Tim Roberts Jun 08 '22 at 18:09
  • I did. I tried both of those. Both resulted in the same error – Hunter Michael Jun 08 '22 at 18:19
  • It seems you try to perform a request on host "REDACTEDSITE" which is obviously not a valid URL. Did you open the txt file you wrote to ensure it containes only valid URIs ? – Antwane Jun 08 '22 at 18:28
  • even better, manage the (very common issue) that the file may contains invalid URIs – Carlo Jun 08 '22 at 18:32
  • @Antwane As I mentioned I actually removed the url because I wanted to keep the url private – Hunter Michael Jun 08 '22 at 18:56
  • @Carlo All the urls in the list are corrrect. It's a csv export from our hosting. – Hunter Michael Jun 08 '22 at 18:56
  • 1
    Did you notice that the line getting the exception (which is `se.get`) is not inside the `try` block? That's the problem. Move the `try:` up one line. – Tim Roberts Jun 08 '22 at 19:10
  • Do following answers help you? [python - socket.gaierror: \[Errno -2\] Name or service not known with Python3 - Stack Overflow](https://stackoverflow.com/questions/44591027/socket-gaierror-errno-2-name-or-service-not-known-with-python3) and [socket.gaierror: \[Errno -2\] Name or service not known | Python - Stack Overflow](https://stackoverflow.com/questions/57234628/socket-gaierror-errno-2-name-or-service-not-known-python) – hc_dev Jun 08 '22 at 19:18
  • @TimRoberts That did not seem to fix it. Also note: The code DOES work. I do get the outputs as expected. BUT when I run into a site with a DNS issue. It breaks the program – Hunter Michael Jun 08 '22 at 19:19
  • @hc_dev Nope, I saw those before. Didn't seem to help. For the first link, removing https results in a missingschema error instead. The second link didn't work either. The same gaierror persists – Hunter Michael Jun 08 '22 at 19:22
  • mmm, may you should try: https://stackoverflow.com/questions/40145631/precisely-catch-dns-error-with-python-requests to precisely grab the error? Linked to: https://github.com/psf/requests/issues/3630 – Carlo Jun 08 '22 at 19:57
  • @Carlo Doesn't seem to fixed it. I copied and pasted the Hack to test it and put in the domain that was causing the issue and it came with the same error https://prnt.sc/N2Bf5sLP8PhD – Hunter Michael Jun 09 '22 at 16:48
  • I mean, also a ConnectTimeout exception is returning the same error? Since I am seeing the service is trying a lot of ping, probably dealing with a maximun waiting time? the challenge is very interesting imho,https://requests.readthedocs.io/en/master/api/#requests.ConnectTimeout – Carlo Jun 09 '22 at 17:12
  • I managed to solve it ish' I moved this part "response = se.get(site, verify=False, allow_redirects=False)" down into the try (I forgot why I did this, but tat the moment I had found something that made me do it) and the exception " requests.ConnectionError" worked to catch the DNS. No other exception I tried worked. Although it displays an error, it shows the status code as 200. Which is fine cause I added another variable that if it does trigger this itll display the variable. End result if anyone is curious. https://prnt.sc/vHoYuzigIOR8 – Hunter Michael Jun 09 '22 at 22:33

0 Answers0