90

I need to write a script that connects to a bunch of sites on our corporate intranet over HTTPS and verifies that their SSL certificates are valid; that they are not expired, that they are issued for the correct address, etc. We use our own internal corporate Certificate Authority for these sites, so we have the public key of the CA to verify the certificates against.

Python by default just accepts and uses SSL certificates when using HTTPS, so even if a certificate is invalid, Python libraries such as urllib2 and Twisted will just happily use the certificate.

How do I verify a certificate in Python?

Braiam
  • 1
  • 11
  • 47
  • 78
Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256
  • 10
    Your comment about Twisted is incorrect: Twisted uses pyopenssl, not Python's built-in SSL support. While it doesn't validate HTTPS certificates by default in its HTTP client, you can use the "contextFactory" argument to getPage and downloadPage to construct a validating context factory. By contrast, to my knowledge there's no way that the built-in "ssl" module can be convinced to do certificate validation. – Glyph Jul 06 '09 at 14:56
  • 4
    With the SSL module in Python 2.6 and later, you can write your own certificate validator. Not optimal, but doable. – Heikki Toivonen Sep 17 '09 at 22:58
  • 3
    The situation changed, Python now by default validates certificates. I have added a new answer below. – Dr. Jan-Philip Gehrcke Feb 04 '15 at 15:53
  • The situation also changed for Twisted (somewhat before it did for Python, in fact); If you use [`treq`](https://treq.readthedocs.org/) or [`twisted.web.client.Agent`](https://twistedmatrix.com/documents/14.0.0/api/twisted.web.client.Agent.html) since version 14.0, Twisted verifies certificates by default. – Glyph Apr 23 '15 at 00:18

11 Answers11

31

I have added a distribution to the Python Package Index which makes the match_hostname() function from the Python 3.2 ssl package available on previous versions of Python.

http://pypi.python.org/pypi/backports.ssl_match_hostname/

You can install it with:

pip install backports.ssl_match_hostname

Or you can make it a dependency listed in your project's setup.py. Either way, it can be used like this:

from backports.ssl_match_hostname import match_hostname, CertificateError
...
sslsock = ssl.wrap_socket(sock, ssl_version=ssl.PROTOCOL_SSLv3,
                      cert_reqs=ssl.CERT_REQUIRED, ca_certs=...)
try:
    match_hostname(sslsock.getpeercert(), hostname)
except CertificateError, ce:
    ...
Brandon Rhodes
  • 83,755
  • 16
  • 106
  • 147
  • 2
    I'm missing something... can you please fill in the blanks above or provide a complete example (for a site like Google)? – smholloway Jun 13 '13 at 22:54
  • The example will look different depending on which library you are using to access Google, since different libraries put the SSL socket different places, and it is the SSL socket that needs its `getpeercert()` method called so the output can be passed to `match_hostname()`. – Brandon Rhodes Jun 17 '13 at 11:17
  • 13
    I'm embarrassed on Python's behalf that anyone has to use this. Python's built-in SSL HTTPS libraries not verifying certificates out of the box by default is completely insane, and it's painful to imagine how many insecure systems are out there now as a result. – Glenn Maynard Mar 19 '14 at 20:49
  • 1
    @Glenn - Also see [New SSL module doesn't seem to verify hostname against commonName in certificate](http://bugs.python.org/issue1589). – jww Jul 07 '14 at 05:52
26

You can use Twisted to verify certificates. The main API is CertificateOptions, which can be provided as the contextFactory argument to various functions such as listenSSL and startTLS.

Unfortunately, neither Python nor Twisted comes with a the pile of CA certificates required to actually do HTTPS validation, nor the HTTPS validation logic. Due to a limitation in PyOpenSSL, you can't do it completely correctly just yet, but thanks to the fact that almost all certificates include a subject commonName, you can get close enough.

Here is a naive sample implementation of a verifying Twisted HTTPS client which ignores wildcards and subjectAltName extensions, and uses the certificate-authority certificates present in the 'ca-certificates' package in most Ubuntu distributions. Try it with your favorite valid and invalid certificate sites :).

import os
import glob
from OpenSSL.SSL import Context, TLSv1_METHOD, VERIFY_PEER, VERIFY_FAIL_IF_NO_PEER_CERT, OP_NO_SSLv2
from OpenSSL.crypto import load_certificate, FILETYPE_PEM
from twisted.python.urlpath import URLPath
from twisted.internet.ssl import ContextFactory
from twisted.internet import reactor
from twisted.web.client import getPage
certificateAuthorityMap = {}
for certFileName in glob.glob("/etc/ssl/certs/*.pem"):
    # There might be some dead symlinks in there, so let's make sure it's real.
    if os.path.exists(certFileName):
        data = open(certFileName).read()
        x509 = load_certificate(FILETYPE_PEM, data)
        digest = x509.digest('sha1')
        # Now, de-duplicate in case the same cert has multiple names.
        certificateAuthorityMap[digest] = x509
class HTTPSVerifyingContextFactory(ContextFactory):
    def __init__(self, hostname):
        self.hostname = hostname
    isClient = True
    def getContext(self):
        ctx = Context(TLSv1_METHOD)
        store = ctx.get_cert_store()
        for value in certificateAuthorityMap.values():
            store.add_cert(value)
        ctx.set_verify(VERIFY_PEER | VERIFY_FAIL_IF_NO_PEER_CERT, self.verifyHostname)
        ctx.set_options(OP_NO_SSLv2)
        return ctx
    def verifyHostname(self, connection, x509, errno, depth, preverifyOK):
        if preverifyOK:
            if self.hostname != x509.get_subject().commonName:
                return False
        return preverifyOK
def secureGet(url):
    return getPage(url, HTTPSVerifyingContextFactory(URLPath.fromString(url).netloc))
def done(result):
    print 'Done!', len(result)
secureGet("https://google.com/").addCallback(done)
reactor.run()
sigmaris
  • 3
  • 3
Glyph
  • 31,152
  • 11
  • 87
  • 129
  • can you make it non-blocking? – sean riley Jul 06 '09 at 17:36
  • Thanks; I have one note now that I've read and understood this: verify callbacks should return True when there's no error and False when there is. Your code basically returns an error when the commonName is not localhost. I'm not sure whether that's what you intended, though it would make sense to do this in some cases. I just figured I'd leave a comment about this for the benefit of future readers of this answer. – Eli Courtwright Jul 06 '09 at 19:55
  • "self.hostname" in that case is not "localhost"; note the `URLPath(url).netloc`: that means the host part of the URL passed in to secureGet. In other words, it's checking that the commonName of the subject is the same as the one being requested by the caller. – Glyph Jul 09 '09 at 10:31
  • I've been running a version of this test code and have used Firefox, wget, and Chrome to hit a test HTTPS Server. In my test runs though, I'm seeing that the callback verifyHostname is being called 3-4 times every connection. Why is it not just running once? – themaestro Jul 19 '10 at 18:22
  • 2
    URLPath(blah).netloc *is* always localhost: URLPath.__init__ takes individual url components, you're passing an entire url as "scheme" and getting the default netloc of 'localhost' to go with it. You probably meant to use URLPath.fromString(url).netloc. Unfortunately that exposes the check in verifyHostName being backwards: it starts rejecting `https://www.google.com/` because one of the subjects is 'www.google.com', causing the function to return False. It probably meant to return True (accepted) if the names match, and False if they do not? – mzz Sep 30 '10 at 00:02
  • @mzz: sigmaris's edit seems to fix the problem with `verifyHostname()`. – jfs Dec 14 '11 at 00:33
  • @mzz - thanks for spotting that, and thanks to sigmaris for the bugfix. – Glyph Feb 25 '12 at 21:39
26

PycURL does this beautifully.

Below is a short example. It will throw a pycurl.error if something is fishy, where you get a tuple with error code and a human readable message.

import pycurl

curl = pycurl.Curl()
curl.setopt(pycurl.CAINFO, "myFineCA.crt")
curl.setopt(pycurl.SSL_VERIFYPEER, 1)
curl.setopt(pycurl.SSL_VERIFYHOST, 2)
curl.setopt(pycurl.URL, "https://internal.stuff/")

curl.perform()

You will probably want to configure more options, like where to store the results, etc. But no need to clutter the example with non-essentials.

Example of what exceptions might be raised:

(60, 'Peer certificate cannot be authenticated with known CA certificates')
(51, "common name 'CN=something.else.stuff,O=Example Corp,C=SE' does not match 'internal.stuff'")

Some links that I found useful are the libcurl-docs for setopt and getinfo.

smholloway
  • 589
  • 7
  • 14
plundra
  • 18,542
  • 3
  • 33
  • 27
22

From release version 2.7.9/3.4.3 on, Python by default attempts to perform certificate validation.

This has been proposed in PEP 467, which is worth a read: https://www.python.org/dev/peps/pep-0476/

The changes affect all relevant stdlib modules (urllib/urllib2, http, httplib).

Relevant documentation:

https://docs.python.org/2/library/httplib.html#httplib.HTTPSConnection

This class now performs all the necessary certificate and hostname checks by default. To revert to the previous, unverified, behavior ssl._create_unverified_context() can be passed to the context parameter.

https://docs.python.org/3/library/http.client.html#http.client.HTTPSConnection

Changed in version 3.4.3: This class now performs all the necessary certificate and hostname checks by default. To revert to the previous, unverified, behavior ssl._create_unverified_context() can be passed to the context parameter.

Note that the new built-in verification is based on the system-provided certificate database. Opposed to that, the requests package ships its own certificate bundle. Pros and cons of both approaches are discussed in the Trust database section of PEP 476.

Dr. Jan-Philip Gehrcke
  • 33,287
  • 14
  • 85
  • 130
  • any solutions to ensure verifications of certificate for previous version of python ? One can not always upgrade the version of python. – vaab Apr 13 '15 at 02:58
  • it doesn't validate revoked certificates. E.g. revoked.badssl.com – Raz Mar 08 '18 at 16:34
  • Is it compulsory to use `HTTPSConnection` class? I was using [`SSLSocket`](https://docs.python.org/3/library/ssl.html#ssl.SSLSocket). How can I do validation with `SSLSocket`? Do I have to explicitly validate using `pyopenssl` as explained [here](http://www.yothenberg.com/validate-x509-certificate-in-python/)? – MsA Jun 11 '18 at 17:16
15

Or simply make your life easier by using the requests library:

import requests
requests.get('https://somesite.com', cert='/path/server.crt', verify=True)

A few more words about its usage.

laffuste
  • 16,287
  • 8
  • 84
  • 91
ufo
  • 1,622
  • 16
  • 15
14

Here's an example script which demonstrates certificate validation:

import httplib
import re
import socket
import sys
import urllib2
import ssl

class InvalidCertificateException(httplib.HTTPException, urllib2.URLError):
    def __init__(self, host, cert, reason):
        httplib.HTTPException.__init__(self)
        self.host = host
        self.cert = cert
        self.reason = reason

    def __str__(self):
        return ('Host %s returned an invalid certificate (%s) %s\n' %
                (self.host, self.reason, self.cert))

class CertValidatingHTTPSConnection(httplib.HTTPConnection):
    default_port = httplib.HTTPS_PORT

    def __init__(self, host, port=None, key_file=None, cert_file=None,
                             ca_certs=None, strict=None, **kwargs):
        httplib.HTTPConnection.__init__(self, host, port, strict, **kwargs)
        self.key_file = key_file
        self.cert_file = cert_file
        self.ca_certs = ca_certs
        if self.ca_certs:
            self.cert_reqs = ssl.CERT_REQUIRED
        else:
            self.cert_reqs = ssl.CERT_NONE

    def _GetValidHostsForCert(self, cert):
        if 'subjectAltName' in cert:
            return [x[1] for x in cert['subjectAltName']
                         if x[0].lower() == 'dns']
        else:
            return [x[0][1] for x in cert['subject']
                            if x[0][0].lower() == 'commonname']

    def _ValidateCertificateHostname(self, cert, hostname):
        hosts = self._GetValidHostsForCert(cert)
        for host in hosts:
            host_re = host.replace('.', '\.').replace('*', '[^.]*')
            if re.search('^%s$' % (host_re,), hostname, re.I):
                return True
        return False

    def connect(self):
        sock = socket.create_connection((self.host, self.port))
        self.sock = ssl.wrap_socket(sock, keyfile=self.key_file,
                                          certfile=self.cert_file,
                                          cert_reqs=self.cert_reqs,
                                          ca_certs=self.ca_certs)
        if self.cert_reqs & ssl.CERT_REQUIRED:
            cert = self.sock.getpeercert()
            hostname = self.host.split(':', 0)[0]
            if not self._ValidateCertificateHostname(cert, hostname):
                raise InvalidCertificateException(hostname, cert,
                                                  'hostname mismatch')


class VerifiedHTTPSHandler(urllib2.HTTPSHandler):
    def __init__(self, **kwargs):
        urllib2.AbstractHTTPHandler.__init__(self)
        self._connection_args = kwargs

    def https_open(self, req):
        def http_class_wrapper(host, **kwargs):
            full_kwargs = dict(self._connection_args)
            full_kwargs.update(kwargs)
            return CertValidatingHTTPSConnection(host, **full_kwargs)

        try:
            return self.do_open(http_class_wrapper, req)
        except urllib2.URLError, e:
            if type(e.reason) == ssl.SSLError and e.reason.args[0] == 1:
                raise InvalidCertificateException(req.host, '',
                                                  e.reason.args[1])
            raise

    https_request = urllib2.HTTPSHandler.do_request_

if __name__ == "__main__":
    if len(sys.argv) != 3:
        print "usage: python %s CA_CERT URL" % sys.argv[0]
        exit(2)

    handler = VerifiedHTTPSHandler(ca_certs = sys.argv[1])
    opener = urllib2.build_opener(handler)
    print opener.open(sys.argv[2]).read()
A B
  • 8,340
  • 2
  • 31
  • 35
Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256
  • @tonfa: Good catch; I ended up adding hostname checking as well, and I've edited my answer to include the code I used. – Eli Courtwright Oct 07 '10 at 17:01
  • I can't reach the original link (i.e. 'this page'). Has it moved? – Matt Ball Sep 09 '11 at 17:37
  • @Matt: I guess so, but FWIW the original link isn't necessary, since my test program is a complete, self-contained, working example. I linked to the page which helped me write that code since it seemed like the decent thing to provide attribution. But since it doesn't exist anymore, I'll edit my post to remove the link, thanks for pointing this out. – Eli Courtwright Sep 12 '11 at 14:06
  • This doesn't work with additional handlers like proxy handlers because of the manual socket connection in `CertValidatingHTTPSConnection.connect`. See [this pull request](https://github.com/wbond/sublime_package_control/pull/116) for details (and a fix). – schlamar Jun 26 '12 at 06:06
  • 2
    [Here](https://gist.github.com/2993700) is a cleaned up and working solution with `backports.ssl_match_hostname`. – schlamar Jun 26 '12 at 06:21
8

M2Crypto can do the validation. You can also use M2Crypto with Twisted if you like. The Chandler desktop client uses Twisted for networking and M2Crypto for SSL, including certificate validation.

Based on Glyphs comment it seems like M2Crypto does better certificate verification by default than what you can do with pyOpenSSL currently, because M2Crypto checks subjectAltName field too.

I've also blogged on how to get the certificates Mozilla Firefox ships with in Python and usable with Python SSL solutions.

Sibren
  • 1,068
  • 11
  • 11
Heikki Toivonen
  • 30,964
  • 11
  • 42
  • 44
4

The following code allows you to benefit from all SSL validation checks (e.g. date validity, CA certificate chain ...) EXCEPT a pluggable verification step e.g. to verify the hostname or do other additional certificate verification steps.

from httplib import HTTPSConnection
import ssl


def create_custom_HTTPSConnection(host):

    def verify_cert(cert, host):
        # Write your code here
        # You can certainly base yourself on ssl.match_hostname
        # Raise ssl.CertificateError if verification fails
        print 'Host:', host
        print 'Peer cert:', cert

    class CustomHTTPSConnection(HTTPSConnection, object):
        def connect(self):
            super(CustomHTTPSConnection, self).connect()
            cert = self.sock.getpeercert()
            verify_cert(cert, host)

    context = ssl.create_default_context()
    context.check_hostname = False
    return CustomHTTPSConnection(host=host, context=context)


if __name__ == '__main__':
    # try expired.badssl.com or self-signed.badssl.com !
    conn = create_custom_HTTPSConnection('badssl.com')
    conn.request('GET', '/')
    conn.getresponse().read()
Carl D'Halluin
  • 1,052
  • 10
  • 14
4

Jython DOES carry out certificate verification by default, so using standard library modules, e.g. httplib.HTTPSConnection, etc, with jython will verify certificates and give exceptions for failures, i.e. mismatched identities, expired certs, etc.

In fact, you have to do some extra work to get jython to behave like cpython, i.e. to get jython to NOT verify certs.

I have written a blog post on how to disable certificate checking on jython, because it can be useful in testing phases, etc.

Installing an all-trusting security provider on java and jython.
http://jython.xhaus.com/installing-an-all-trusting-security-provider-on-java-and-jython/

Alan Kennedy
  • 131
  • 2
-1

pyOpenSSL is an interface to the OpenSSL library. It should provide everything you need.

DisplacedAussie
  • 4,578
  • 1
  • 27
  • 21
-1

I was having the same problem but wanted to minimize 3rd party dependencies (because this one-off script was to be executed by many users). My solution was to wrap a curl call and make sure that the exit code was 0. Worked like a charm.

Ztyx
  • 14,100
  • 15
  • 78
  • 114
  • I'd say https://stackoverflow.com/a/1921551/1228491 using pycurl is a much better solution then. – Marian Oct 03 '18 at 21:20