19

I've come accross the following error about html5lib when trying to read an html data frame.

Here is the code:

!pip install html5lib
!pip install lxml
!pip install beautifulSoup4

import html5lib
import lxml
from bs4 import BeautifulSoup

table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

This is the error:

ImportError                               Traceback (most recent call last)
<ipython-input-68-e24654a0a301> in <module>()
----> 1 table_list = pd.read_html("http://www.psmsl.org/data/obtaining/")

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding, decimal, converters, na_values, keep_default_na)
    913                   thousands=thousands, attrs=attrs, encoding=encoding,
    914                   decimal=decimal, converters=converters, na_values=na_values,
--> 915                   keep_default_na=keep_default_na)

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parse(flavor, io, match, attrs, encoding, **kwargs)
    737     retained = None
    738     for flav in flavor:
--> 739         parser = _parser_dispatch(flav)
    740         p = parser(io, compiled_match, attrs, encoding)
    741 

/home/sage/sage-8.0/local/lib/python2.7/site-packages/pandas/io/html.pyc in _parser_dispatch(flavor)
    680     if flavor in ('bs4', 'html5lib'):
    681         if not _HAS_HTML5LIB:
--> 682             raise ImportError("html5lib not found, please install it")
    683         if not _HAS_BS4:
    684             raise ImportError(

ImportError: html5lib not found, please install it

Any help would be much appreciated. Thanks

J. Serra
  • 440
  • 1
  • 4
  • 13

3 Answers3

23

If you read the error message, you don't have html5lib installed. Do:

pip install html5lib

in your terminal.


If you are calling from jupyter notebook (just like you did with !), try to restart the kernel in order to have the packages loaded.

TYZ
  • 8,466
  • 5
  • 29
  • 60
0

I had this exact error show up while trying to read a saved .htm file using Spyder IDE.

This code displayed html5lib error:

import pandas as pd
df = pd.read_html("F:\xxxx\xxxxx\xxxxx\aaaa.htm")

I knew I had html5lib installed and working correctly because I had other scripts that worked.

For whatever reason, file path needed to be a string literal (putting an r in front of the file path).

This code works for me:

import pandas as pd
df = pd.read_html(r"F:\xxxx\xxxxx\xxxxx\aaaa.htm")
-1

I ran into this error when I gave the wrong path to the local file I was trying to open. So also be sure that you're pointing to the right place!

Yanofsky
  • 1,806
  • 1
  • 17
  • 15