3

I'm well aware of the fact that generally speaking, it's not. But in my particular case, I'm writing a simple python web-scraper which will be run as a cron job every hour and I'd like to be sure that it's not a risk to ignore verifying an SSL certificate by setting verify to False.

P.S. The reason why I'm set on disabling this feature is because when trying to make a requests response = requests.get('url') It raises an SSLError and I don't see how to handle it.

EDIT:

Okay, with the help of sigmavirus24 and others I've finally managed to resolve the problem. Here's the explanation of how I did it:

  • I ran a test at https://ssllabs.com/ and according to the report provided by SSLLabs, the SSL error would get raised due to the "incomplete certificate chain" issue (for more details on how certificate verification works read sigmaviruses24's answer).

In my case, one of the intermediaries was missing.

  • I searched for its fingerprint using google and downloaded it in .pem format.
  • Then I used "certifi" (it's a python package for providing Mozilla's CA Bundle. If you don't have it, you can install it with sudo pip install certifi) to find the root cert (again by its fingerprint). This can be done as follows:

    $ ipython
    In [1]: import certifi
    In [2]: certifi.where()
    Out[2]: /usr/lib/python3.6/site-packages/certifi/cacert.pem
    In [3]: quit
    
    $ emacs -nw /usr/lib/python3.6/site-packages/certifi/cacert.pem
    

Or in bash you can issue $ emacs -nw $(python -m certifi) to open the cacert.pem file.

  • Concated two certs together in one file and then provided its path to the verify parameter.

Another (more simple but not always possible) way to do this is to download the whole chain from SSLLabs, right in front of the "Additional Certificates (if supplied)" section there's the "Downlaod server chain" button. Click it, save the chain in a .pem file and when calling requests's get method, provide the file path to the verify parameter.

Albert
  • 2,146
  • 10
  • 32
  • 54
  • Well… do you at all care about verifying that you're connecting to the host you think you're connecting to? Or are you fine with possibly being man-in-the-middle attacked? – deceze Jan 19 '17 at 11:12
  • @deceze well I reckon I do. But from what I've heard, well...some claim, this may be acceptable for quick/throwaway applications/scripts, but really should not go to production software. – Albert Jan 19 '17 at 11:19
  • 4
    1) How much work is it to investigate why verification fails and to fix that issue? 2) How bad would it be if you got man-in-the-middled and scraped the wrong site? – Weigh those two things against each other… – deceze Jan 19 '17 at 11:21

3 Answers3

3

The correct answer here is "it depends".

You've given us very little information to go on, so I'm going to make some assumptions and list them below (if any of them do not match, then you should reconsider your choice):

  1. You are constantly connecting to the same website in your CRON job
  2. You know the website fairly well and are certain that the certificate-related errors are benign
  3. You are not sending sensitive data to the website in order to scrape it (such as login and user name)

If that is the situation (which I am guessing it is) then it should be generally harmless. That said, whether or not it is "safe" depends on your definition of that word in the context of two computers talking to each other over the internet.

As others have said, Requests does not attempt to render HTML, parse XML, or execute JavaScript. Because it simply is retrieving your data, then the biggest risk you run is not receiving data that can be verified came from the server you thought it was coming from. If, however, you're using requests in combination with something that does the above, there are a myriad of potential attacks that a malicious man in the middle could use against you.

There are also options that mean you don't have to forgo verification. For example, if the server uses a self-signed certificate, you could get the certificate in PEM format, save it to a file and provide the path to that file to the verify argument instead. Requests would then be able to validate the certificate for you.

So, as I said, it depends.


Update based on Albert's replies

So what appears to be happening is that the website in question sends only the leaf certificate which is valid. This website is relying on browser behaviour that currently works like so:

The browser connects to the website and notes that the site does not send it's full certificate chain. It then goes and retrieves the intermediaries, validates them, and completes the connection. Requests, however, uses OpenSSL for validation and OpenSSL does not contain any such behaviour. Since the validation logic is almost entirely in OpenSSL, Requests has no way to emulate a browser in this case.

Further, Security tooling (e.g., SSLLabs) has started counting this configuration against a website's security ranking. It's increasingly the opinion that websites should send the entire chain. If you encounter a website that doesn't, contacting them and informing them of that is the best course forward.

If the website refuses to update their certificate chain, then Requests' users can retrieve the PEM encoded intermediary certificates and stick them in a .pem file which they then provide to the verify parameter. Requests presently only includes Root certificates in its trust store (as every browser does). It will never ship intermediary certificates because there are just too many. So including the intermediaries in a bundle with the root certificate(s) will allow you to verify the website's certificate. OpenSSL will have a PEM encoded file that has each link in the chain and will be able to verify up to the root certificate.

Ian Stapleton Cordasco
  • 26,944
  • 4
  • 67
  • 72
  • well your assumptions are, let's say, 75% correct. The fist and the third one are spot on. The site I'm connecting to contains a list of items and their prices, the idea behind the script I'm writing is simple - it looks for foods with certain ids and checks if their prices drop (which means sales) and then it generates an email with all the neccessary info and sends it at my email address. As far as I see, the error eraises due to chain issues with the intermediate chain certificates. – Albert Jan 21 '17 at 10:36
  • So I have to somehow obtain the missing certs, merge them into a file and set `verify='path_to_the_file'`. But I still don't quite understand where to look for the root cert. – Albert Jan 21 '17 at 10:37
  • @Albert so the website's leaf is valid but it is not presently supplying the intermediaries? That's an SSL configuration problem on their site that needs to be addressed. If you can get the whole chain and add it to your trust store, that will clear up the issues for now. That said, browsers and security professionals are starting to scold websites configured as you describe. It's not hygenic security practice on their part. – Ian Stapleton Cordasco Jan 21 '17 at 13:35
  • Thank you a lot for breaking this down for me. Your answer helped a a lot. I've updated my question where I described in detail how I solved the issue. If you have any objections or remarks, please comment! – Albert Jan 22 '17 at 15:20
2

This is probably one more appropriate on https://security.stackexchange.com/.

Effectively it makes it only slightly better than using HTTP instead of HTTPS. So almost all (apart from without the server's certificate someone would have to actively do something) of the risks of HTTP would apply.

Basically it would be possible to see both the sent and received data by a Man in The Middle attack.. or even if that site had ever been compromised and the certificate was stolen from them. If you are storing cookies for that site, those cookies will be revealed (i.e. if facebook.com then a session token could be stolen) if you are logging in with a username and password then that could be stolen too.

What do you do with that data once you retrieve it? Are you downloading any executable code? Are you downloading something (images you store on a web-server?) that a skilled attacker (even by doing something like modifying your DNS settings on your router) could force you to download a file ("news.php") and store on your web-server that could become executable (a .php script instead of a web-page)?

Community
  • 1
  • 1
Matthew1471
  • 238
  • 2
  • 11
1

From the documentation:

Requests can also ignore verifying the SSL certficate if you set verify to False.

requests.get('https://kennethreitz.com', verify=False)
<Response [200]>

It is 'safe', if you aren't using sensitive information in your request.

You can't put a virus in the HTML itself (as far as I know), Javascript can be a vulnerability, so it's a great thing Python doesn't process it.

So all in all, you should be safe

azro
  • 53,056
  • 7
  • 34
  • 70
Will
  • 4,942
  • 2
  • 22
  • 47
  • 2
    "You can't put a virus in the HTML itself" Sure you can. The trick is getting it to be used as something other than HTML. – JAB Jan 19 '17 at 22:04