I have a list of a few thousand URLs and noticed one of them is throwing as SSLError
when passed into requests.get()
. Below is my attempt to work around this using both a solution suggested in this similar question as well as a failed attempt to catch the error with a "try & except" block using ssl.SSLError
:
url = 'https://archyworldys.com/lidl-recalls-puff-pastry/'
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
try:
response = session.get(url,allow_redirects=False,verify=True)
except ssl.SSLError:
pass
The error returned at the very end is:
SSLError: HTTPSConnectionPool(host='archyworldys.com', port=443): Max retries exceeded with url: /lidl-recalls-puff-pastry/ (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'ssl3_get_server_certificate', 'certificate verify failed')],)",),))
When I opened the URL in Chrome, I get a "Not Secure" / "Privacy Error" that blocks the webpage. However, if I try the URL with HTTP instead of HTTPS (e.g. 'http://archyworldys.com/lidl-recalls-puff-pastry/') it works just fine in my browser. Per this question, setting verify
to False
solves the problem, but I prefer to find a more secure work-around.
While I understand a simple solution would be to remove the URL from my data, I'm trying to find a solution that let's me proceed (e.g. if in a for loop
) by simply skipping this bad URL and moving on the next one.