6

Im using newest Kubuntu with Python 2.7.6. I try to download a https page using the below code:

import urllib2

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'pl-PL,pl;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(main_page_url, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

However, I get such an error:

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    page = urllib2.urlopen(req)
  File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 404, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 422, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
    return self.do_open(httplib.HTTPSConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:510: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error>

How to solve this?

SOLVED!

I used url https://www.ssllabs.com given by @SteffenUllrich. It turned out that the server uses TLS 1.2, so I updated python to 2.7.10 and modified my code to:

import ssl
import urllib2

context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'pl-PL,pl;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(main_page_url, headers=hdr)

try:
    page = urllib2.urlopen(req,context=context)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Now it downloads the page.

yak
  • 3,770
  • 19
  • 60
  • 111
  • Your script works for me with Python 2.7.10 and `https://facebook.com`. What url do you try? Does it happen with one URL or with multiple https ones? – MartyIX Nov 28 '15 at 15:01
  • @MartinVseticka: Works with facebook for me too, so it's probably the page issue. What now? – yak Nov 28 '15 at 15:20
  • No, it is not. But it's harder without being able to reproduce the error, so your chances are lower that somebody will answer your question. Anyway, just try if the same happens with curl (or any other tool). My guess is that the issue is on the openssl side rather than on the Python side. – MartyIX Nov 28 '15 at 15:25

3 Answers3

5

Im using newest Kubuntu with Python 2.7.6

The latest Kubuntu (15.10) uses 2.7.10 as far as I know. But assuming you use 2.7.6 which is contained in 14.04 LTS:

Works with facebook for me too, so it's probably the page issue. What now?

Then it depends on the site. Typical problems with this version of Python is missing support for Server Name Indication (SNI) which was only added to Python 2.7.9. Since lots of sites require SNI today (like everything using Cloudflare Free SSL) I guess this is the problem.

But, there are also other possibilities like multiple trust path which is only fixed with OpenSSL 1.0.2. Or simply missing intermediate certificates etc. More information and maybe also workarounds are only possible if you provide the URL or you analyze the situation yourself based on this information and the analysis from SSLLabs.

Community
  • 1
  • 1
Steffen Ullrich
  • 114,247
  • 10
  • 131
  • 172
  • Yes, its Kubuntu 14.04, and my OpenSSL is OpenSSL 1.0.1f 6 Jan 2014 – yak Nov 28 '15 at 15:33
  • Thank you so much. I used the SSLLabs page you posted, and checked the version of TLS used by the page. It turned out its TLS 1.2. I modified the code, will edit my first post and add modified code and the explanation. Thank you! – yak Nov 28 '15 at 16:02
  • @yak: since TLS 1.2 is also supported with Python 2.7.6. in (K)ubuntu 14.04 my guess is that the upgrade to Python 2.7.10 simply fixed the SNI issue and that's why it worked. Nevertheless, it counts that it works. – Steffen Ullrich Nov 28 '15 at 17:02
1

old version of python 2.7.3 use

requests.get(download_url, headers=headers, timeout=10, stream=True)

get the following Warning and Exception:

You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
SSLError(SSLError(1, '_ssl.c:504: error:14077438:SSL routines:SSL23_GET_SERVER_HELLO:tlsv1 alert internal error')

enter image description here

just follow the advice, visit Certificate verification in Python 2

run

pip install urllib3[secure]

and problem solved.

rongdong.bai
  • 471
  • 1
  • 6
  • 16
0

The above answer is only partially correct, you can add a fix to solve this issue:

Code:

def allow_unverified_content():
    """
    A 'fix' for Python SSL CERTIFICATE_VERIFY_FAILED (mainly python 2.7)
    """
    if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
            getattr(ssl, '_create_unverified_context', None)):
        ssl._create_default_https_context = ssl._create_unverified_context

Call it with no options:

allow_unverified_content()
Mike Q
  • 6,716
  • 5
  • 55
  • 62