I'm running into trouble scraping a website after they changed from http to https and don't know how to solve this. The website I'm trying to scrape from is https://www.boldsystems.org. Two days ago it was still http://www.boldsystems.org and my scraper worked perfectly.
Example code:
import requests
requests.get('https://www.boldsystems.org')
Error code I get back:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\contrib\pyopenssl.py", line 488, in wrap_socket
cnx.do_handshake()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\SSL.py", line 1934, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\SSL.py", line 1671, in _raise_ssl_error
_raise_current_error()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\OpenSSL\_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 677, in urlopen
chunked=chunked,
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 381, in _make_request
self._validate_conn(conn)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 976, in _validate_conn
conn.connect()
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 370, in connect
ssl_context=context,
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\ssl_.py", line 377, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\contrib\pyopenssl.py", line 494, in wrap_socket
raise ssl.SSLError("bad handshake: %r" % e)
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 725, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 439, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='boldsystems.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_
certificate', 'certificate verify failed')])")))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Users\dommi\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='boldsystems.org', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_cert
ificate', 'certificate verify failed')])")))
I found some solutions suggesting disabling the verification like:
requests.get('https://boldsystems.org', verify = False)
But I think this is a bad way to do it since there is a reason for the SSL verification.
I already updated certify, requests and urllib3. I also tried saving the SSL certificate to a .pem file and handing it to the request function, but I'm actually not sure what that does and it did not help.
I can reproduce the issue on Windows and Ubuntu as well as on different computers, so I think the problem is somewhere at the website I try to request.
I'd really appreciate a solution to my problem or an explanation what is happening here.