How to use Python urllib.request for Web Scraping 2018

Question

I wrote a simple script from a video tutorial:

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://pythonprogramming.net/parsememcparseface/').read()

soup = bs.BeautifulSoup(source, 'lxml')

print(source)

And it returns this error when I run the program:

Traceback (most recent call last):
  File "/Users/UntouchedDruid4/Projects/Web_Scraper/app.py", line 4, in <module>
    source = urllib.request.urlopen('https://pythonprogramming.net/parsememcparseface/').read()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
    return opener.open(url, data, timeout)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 526, in open
    response = self._open(req, data)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 544, in _open
    '_open', req)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
    result = func(*args)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1361, in https_open
    context=self._context, check_hostname=self._check_hostname)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 1320, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)>

And I have no idea what that means. Please help.

I have never encountered this exact error before but [this](https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error) seems relevant — Xantium, May 04 '18 at 17:16

score 0 · Answer 1 · answered May 04 '18 at 17:08

use urllib2 or requests and for scraping use re.search or BeautifulSoup As You Want

import urllib2
from bs4 import BeautifulSoup
import re

read = urllib2.urlopen('https://pythonprogramming.net/parsememcparseface/').read()

Using RE.SEARCH

f = re.search(r'<title>(.*)</title>', read)
title = f.group(1)
print " Title Of the Site Is : " + title

using BeautifulSoup

soup = BeautifulSoup(read, 'html.parser')
print soup.title ## Example For Title

It is only An Example for Title

How to use Python urllib.request for Web Scraping 2018

1 Answers1