0

The following page does not open using urllib in Python:

https://efactssc-public.flcourts.org/casedocuments/2019/1464/2019-1464_brief_137452_supp20initial20brief2dmerits.pdf

As shown below, I've tried it in Python 2 and Python 3, and have tried using the SSL monkey-patch fix described here. Any other suggestions?

Python 2 Code and Error

import urllib
urllib.urlopen('https://efactssc-public.flcourts.org/casedocuments/2019/1464/2019-1464_brief_137452_supp20initial20brief2dmerits.pdf')

Here is the Python 2 error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\urllib.py", line 87, in urlopen
    return opener.open(url)
  File "C:\Python27\lib\urllib.py", line 215, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 445, in open_https
    h.endheaders(data)
  File "C:\Python27\lib\httplib.py", line 1065, in endheaders
    self._send_output(message_body)
  File "C:\Python27\lib\httplib.py", line 892, in _send_output
    self.send(msg)
  File "C:\Python27\lib\httplib.py", line 854, in send
    self.connect()
  File "C:\Python27\lib\httplib.py", line 1290, in connect
    server_hostname=server_hostname)
  File "C:\Python27\lib\ssl.py", line 369, in wrap_socket
    _context=self)
  File "C:\Python27\lib\ssl.py", line 599, in __init__
    self.do_handshake()
  File "C:\Python27\lib\ssl.py", line 828, in do_handshake
    self._sslobj.do_handshake()
IOError: [Errno socket error] EOF occurred in violation of protocol (_ssl.c:727)

Python 3 Code / Error

I got a similar error running the following code in Python 3:

import urllib.request
urllib.request.urlopen('https://efactssc-public.flcourts.org/casedocuments/2019/1464/2019-1464_brief_137452_supp20initial20brief2dmerits.pdf')

Error:

Traceback (most recent call last):
  File "c:\Python38\lib\urllib\request.py", line 1319, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "c:\Python38\lib\http\client.py", line 1230, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "c:\Python38\lib\http\client.py", line 1276, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "c:\Python38\lib\http\client.py", line 1225, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "c:\Python38\lib\http\client.py", line 1004, in _send_output
    self.send(msg)
  File "c:\Python38\lib\http\client.py", line 944, in send
    self.connect()
  File "c:\Python38\lib\http\client.py", line 1399, in connect
    self.sock = self._context.wrap_socket(self.sock,
  File "c:\Python38\lib\ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "c:\Python38\lib\ssl.py", line 1040, in _create
    self.do_handshake()
  File "c:\Python38\lib\ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
OSError: [Errno 0] Error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python38\lib\urllib\request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "c:\Python38\lib\urllib\request.py", line 525, in open
    response = self._open(req, data)
  File "c:\Python38\lib\urllib\request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "c:\Python38\lib\urllib\request.py", line 502, in _call_chain
    result = func(*args)
  File "c:\Python38\lib\urllib\request.py", line 1362, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
  File "c:\Python38\lib\urllib\request.py", line 1322, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 0] Error>

SSL Fix Does Not Work

Another similar issue suggested monkey patching SSL with the code below, but that doesn't work in this case. The code below raises the same error (python 2):

import ssl
ssl._create_default_https_context = ssl._create_unverified_context
import urllib
urllib.urlopen('https://efactssc-public.flcourts.org/casedocuments/2019/1464/2019-1464_brief_137452_supp20initial20brief2dmerits.pdf')
speedplane
  • 15,673
  • 16
  • 86
  • 138
  • 1
    Same problem as in the other question: hopelessly broken and outdated server with no support for modern ciphers - see [SSLLabs report](https://www.ssllabs.com/ssltest/analyze.html?d=efactssc-public.flcourts.org). With older versions of Python/OpenSSL you might work around the problem with the proposed answer but with newer versions this will not work anymore since the necessary cipher is not compiled in. – Steffen Ullrich Jun 02 '20 at 20:29
  • @SteffenUllrich The servers might be old and janky, but given the fact that it works properly in curl and every browser, I would also expect it to work with Python. Is there any way to submit a bug report on this? Would it go to Python or one of the library maintainers? Also, I don't think this issue should be marked as a duplicate. The error message I receive is different from the other ones that you cite. Maybe change your comment to an answer. – speedplane Jun 08 '20 at 23:37
  • While the error message might be slightly different in syntax it essentially boils to handshake problems. And looking at the site the reason is the same - no usable cipher since the server is only implemented insecure and weak ciphers. And while browsers have a highly tolerant behavior in many regards (including broken TLS setups) I think this behavior is needlessly encouraging admins to not fix their servers. Don't expect such overly tolerant behavior from non-browsers - not only Python will fail with such a broken site. – Steffen Ullrich Jun 09 '20 at 04:58
  • *"given the fact that it works properly in curl..."* - I don't know what version you are using but does not work for me with `curl 7.52.1 (x86_64-pc-linux-gnu) libcurl/7.52.1 OpenSSL/1.0.2l zlib/1.2.8`: **"curl: (35) Unknown SSL protocol error in connection to efactssc-public.flcourts.org:443"*. And this is already a pretty old version of curl and OpenSSL. – Steffen Ullrich Jun 09 '20 at 05:00
  • @SteffenUllrich It works for me on windows/cygwin: `curl 7.55.1 (Windows) libcurl/7.55.1 WinSSL`. Maybe windows supports these older standarsd? Curious... if you're running on linux, can you open the PDF a web browser? – speedplane Jun 09 '20 at 23:58
  • Also... if the admin is a government (as is the case in the example PDF), there isn't much you can do to encourage/discourage behavior. The fact that Python cannot access this government data is pretty sad. – speedplane Jun 10 '20 at 00:02
  • *"if the admin is a government (as is the case in the example PDF), there isn't much you can do to encourage/discourage behavior."* - not sure about this. There are often regulations of how accessible information must be and regulations about the security of the infrastructure and this broken setup violates both. – Steffen Ullrich Jun 10 '20 at 05:06
  • *"WinSSL. Maybe windows supports these older standards ..."* - might be. OpenSSL has disabled 3DES (which is the strongest of all the ciphers supported by the server) for some years already since it is too weak. *"...can you open the PDF a web browser?..."* - Firefox and Chrome comes with their own TLS stack which still support this cipher. But Chrome clearly marks the connection as "Not secure" due to the weak cipher and Firefox reports this problem too. – Steffen Ullrich Jun 10 '20 at 05:11
  • *> "if the admin is a government ... there isn't much you can do to encourage/discourage behavior." - not sure about this.* Ha... and how exactly would one go about asking Florida to update their records website? Should you hire a lobbyist to get new legislation passed? Or hire a lawyer and sue the government? That'll only take a decade and cost millions, but yes, I suppose it's possible. – speedplane Jun 24 '20 at 02:25
  • You could instrument your browser with `pyppeteer`. If you are lucky this is only some couple LoC. – Wolfgang Kuehn Nov 12 '20 at 15:51

0 Answers0