I have a list of around 2,500 sites in which I want to check the status code. Here is the code example:
from bs4 import BeautifulSoup
import requests
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
master = "The URL"
req = requests.get(master)
req = req.text
with open('Url.txt', 'w') as writefile:
writefile.write(req)
with requests.Session() as se:
se.headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"Accept-Encoding": "gzip, deflate",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Language": "en"
}
file = open("Url.txt", "r")
test_sites = []
for line in file:
stripped_line = line.strip()
test_sites.append(stripped_line)
for site in test_sites:
#get page soure
response = se.get(site, verify=False, allow_redirects=False)
try:
response.raise_for_status()
outputVariable = f"""{site} : {response}\n"""
# Save file
with open('UrlOutput.txt', 'a') as f:
f.write(outputVariable)
print(site, response)
#print(response.text)
except requests.exceptions.HTTPError as e:
print(e)
I've tried every exception I could find on urllib3 documentation as well as requests.exceptions documentation nothing worked. I still kept receiving this error.
gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
NewConnectionError Traceback (most recent call last)
NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
MaxRetryError Traceback (most recent call last)
MaxRetryError: HTTPSConnectionPool(host='REDACTEDSITE', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known'))
During handling of the above exception, another exception occurred:
ConnectionError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
514 raise SSLError(e, request=request)
515
--> 516 raise ConnectionError(e, request=request)
517
518 except ClosedPoolError as e:
ConnectionError: HTTPSConnectionPool(host='REDACTEDSITE', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff8dcef9890>: Failed to establish a new connection: [Errno -2] Name or service not known'))
I removed the site url from the error code. But know this. The site IS DOWN. There's a DNS issue. And stuff like this is exactly what I'm trying to catch. How can I catch this error instead of it stopping the program?