0

I am attempting to scrap a website but BeautifulSoup is kicking me an error. I am not sure what is causing the error Bs4 and html5lib are installed. Does anyone have an idea here?

Python Code

from bs4 import BeautifulSoup 
import requests
url = 'http://www.transtats.bts.gov/Data_Elements.aspx?Data=1r'
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html,"html5lib")
print (soup.prettify())

Python Error

runfile('C:/WebsiteGrab.py', wdir=;'somepath')
Traceback (most recent call last):

  File "<ipython-input-1-fc28ecb678ac>", line 1, in <module>
    runfile('C:/Users/bartogre/Desktop/WebsiteGrab.py', wdir='C:/Users/bartogre/Desktop')

  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
    execfile(filename, namespace)

  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Users/bartogre/Desktop/WebsiteGrab.py", line 12, in <module>
    soup = BeautifulSoup(html,"html5lib")

  File "C:\Program Files (x86)\Anaconda3\lib\site-packages\bs4\__init__.py", line 165, in __init__
    % ",".join(features))

FeatureNotFound: Couldn't find a tree builder with the features you requested: html5lib. Do you need to install a parser library?
polonius11
  • 1,703
  • 5
  • 15
  • 23

1 Answers1

0

Accordin to the 'FeatureNotFound' message, try to remove/delete all folder (libriries) which are associated with html5lib at C:\Python(version)\Lib\site-packages\ (do not delete any content at bs4 folder). You re going to test if the problem was after installing html5lib.

Test without html5lib:

from bs4 import BeautifulSoup
import urllib.request

url = "https://www.crummy.com/software/BeautifulSoup/bs4/doc/"

response = urllib.request.urlopen(url)
soup = BeautifulSoup(response)
print (soup.prettify())

If the above code works, then the problem is at installing html5lib parser. Give a try to a new parser, if the bs4 functions doesn't fit you.

An_Bk
  • 179
  • 1
  • 5
  • Hello - I ran your code and I got the following error message: URLError: – polonius11 Oct 29 '16 at 17:47
  • This was answer I found looking at the forum: http://stackoverflow.com/questions/27835619/ssl-certificate-verify-failed-error - do you agree? – polonius11 Oct 29 '16 at 17:49
  • that's because i had urllib imported. you can run instead: – An_Bk Oct 29 '16 at 20:16
  • from bs4 import BeautifulSoup import requests url = "https://www.crummy.com/software/BeautifulSoup/bs4/doc/" response = requests.get(url) html = response.content soup = BeautifulSoup(html) print (soup.prettify()) – An_Bk Oct 29 '16 at 20:19