0

I'm trying to open web page source code using python, but I keep getting the error in the title. I can post the full error, but it's quite long and I figured this would suffice. I've been trying to figure this out for hours, but the solutions I came across either didn't work or jus didn't make sense to me.

I've imported urllib.request and urllib.parse. My code for opening the web page code is below:

Request to open the webpage source

source_request = urllib.request.Request("secure website") 
#Opens the web page
source_open = urllib.request.urlopen('https://www.ohio.edu/engineering/about/people/')
#Reads all the data into a list(?)
source_code = source_open.read()
#Converts data into a string
source_string = source_code.decode()

I'm using python3 as well if that's important. Any help would be appreciated, thanks!

EDIT:

Full Error

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1318, in do_open
    encode_chunked=req.has_header('Transfer-encoding'))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1400, in connect
    server_hostname=server_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 401, in wrap_socket
    _context=self, _session=session)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 808, in __init__
    self.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 1061, in do_handshake
    self._sslobj.do_handshake()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/ssl.py", line 683, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "hw3.py", line 53, in <module>
    main()
  File "hw3.py", line 37, in main
    source_open = urllib.request.urlopen('https://www.ohio.edu/engineering/about/people/')
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:749)>

So, i'm fairly certain the problem stems from secure vs unsecure websites. Is there any way to bypass this?

EDIT:

I found a solution. The new code is below:

#import ssl above this, obviously
context = ssl._create_unverified_context()
source_open = urllib.request.urlopen('https://www.ohio.edu/engineering/about/people/', context = context)
source_code = source_open.read()
source_string = source_code.decode()
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Please ALWAYS include the full error. – Max Feb 15 '17 at 01:09
  • Your python is good. The error message implies that it received an SSL Certificate, but was unable to verify it. Either the certificate is indeed bad (it's not) or you system doesn't include a root certificate authority which validates the chain provided by the certificate (that's my guess.) Ohio.edu uses USERTRUST CA, which isn't installed on Mac by default. (Browser use their own CA's, different from openssl & python). I don't have instructions for loading a new CA on a Mac, but that's my hint for you. – pbuck Feb 15 '17 at 02:06
  • basically your ssl lib doesn't have access to the cert that signed the certificate of the site you are trying to access, you can try disabling verification -- check the accepted answer http://stackoverflow.com/questions/27835619/ssl-certificate-verify-failed-error for more info – Doon Feb 15 '17 at 02:07
  • I figured it out guys, thanks for the help – KalBaratheon Feb 15 '17 at 02:59
  • @KalBaratheon: Note: Your solution means that you're not actually authenticating the website as valid in any way, meaning anyone who can man-in-the-middle your web connection can serve you bogus data and you won't know. For a toy, you probably don't care, but an unverified SSL context is removing the whole purpose of SSL (if you can't validate the server, then the encryption doesn't really matter, since you could be exchanging secret data with the malicious actor directly). You might read [this question](http://stackoverflow.com/q/24675167/364696) on properly configuring CAs on OSX. – ShadowRanger Feb 15 '17 at 04:10
  • @ShadowRanger Thanks, i'm just pulling info from the site, and as of right now it seems to be working fine. But, i'll definitely keep that advice in mind, and look more into the proper way to handle this when I have time. Thanks again. – KalBaratheon Feb 15 '17 at 16:05

0 Answers0