Scraping Crunchbase.com using python

Question

I'm trying to scrap data from crunchbase.com using python requests library.

When ever I've tried to get page source using requests library, urllib library it was giving my ssl error below.

import requests
requests.get('https://www.crunchbase.com/#/home/index')

Traceback (most recent call last):
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 599, in urlopen
    body=body, headers=headers)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 345, in _make_request
    self._validate_conn(conn)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 843, in _validate_conn
    conn.connect()
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\connection.py", line 326, in connect
    ssl_context=context)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\util\ssl_.py", line 325, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 401, in wrap_socket
    _context=self, _session=session)
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 808, in __init__
    self.do_handshake()
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 1061, in do_handshake
    self._sslobj.do_handshake()
  File "C:\Program Files (x86)\Python36-32\lib\ssl.py", line 683, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<pyshell#1>", line 1, in <module>
    requests.get('https://www.crunchbase.com/#/home/index')
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 502, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\sessions.py", line 612, in send
    r = adapter.send(request, **kwargs)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\requests\adapters.py", line 440, in send
    timeout=timeout
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\urllib3\connectionpool.py", line 624, in urlopen
    except (BaseSSLError, CertificateError) as e:
NameError: name 'CertificateError' is not defined

    Also I've tried all the steps in requests library like, verify = false, verify = cacert.pem file and tried multipleways.

Also tried to scrap the same using selenium library, which returns only metadata, which doesn't having any useful info

It also states insecure connection while scraping with selenium

New to python scraping help please, using API only limited information are gathered, awaiting for help

score 0 · Answer 1 · answered Jul 21 '17 at 07:58

0

I think there's an issue with the page's SSL Certificate as it appears to have expired, so attempts to verify SSL cert failed.

You may want to check out this link: SSL error with Python requests despite up-to-date dependencies

answered Jul 21 '17 at 07:58

Khanh Nguyen

101
4

Scraping Crunchbase.com using python

1 Answers1