Current code:
import requests
import pandas as pd
url = 'https://docs.anaconda.com/anaconda/user-guide/getting-started/'
html = requests.get(url, verify=False).content
df_list = pd.read_html(html, flavor='bs4')
df = df_list[0]
I'm tying to extract html from a page using pandas.read_html() function while setting the 'flavor' arg = 'bs4' or 'html5lib'. I get the error: ImportError: html5lib not found, please install it.
C:\Users\...\Miniconda3\lib\site-packages\urllib3\connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made to host 'docs.anaconda.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning,
Traceback (most recent call last):
File "C:\Users\...\Documents\...\data_scrape.py", line 11, in <module>
df_list = pd.read_html(html, flavor='bs4')
File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 1100, in read_html
displayed_only=displayed_only,
File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 891, in _parse
parser = _parser_dispatch(flav)
File "C:\Users\...\Miniconda3\lib\site-packages\pandas\io\html.py", line 840, in _parser_dispatch
raise ImportError("html5lib not found, please install it")
ImportError: html5lib not found, please install it
But I certainly have bs4 and html5lib installed in the env. After running the conda list command:
conda list
# packages in environment at C:\Users\...\Miniconda3\envs\web_scrape:
#
# Name Version Build Channel
beautifulsoup4 4.9.1 py38h32f6830_0 conda-forge
bs4 4.9.1 0 conda-forge
ca-certificates 2020.6.20 hecda079_0 conda-forge
certifi 2020.6.20 py38h32f6830_0 conda-forge
html5lib 1.1 pyh9f0ad1d_0 conda-forge
intel-openmp 2020.1 216
libblas 3.8.0 16_mkl conda-forge
libcblas 3.8.0 16_mkl conda-forge
libiconv 1.15 vc14h29686d3_5 [vc14] anaconda
liblapack 3.8.0 16_mkl conda-forge
libxml2 2.9.10 h464c3ec_1 anaconda
libxslt 1.1.34 he774522_0 anaconda
lxml 4.5.2 py38he3d0fc9_0 conda-forge
mkl 2020.1 216
numpy 1.18.5 py38h72c728b_0 conda-forge
openssl 1.1.1g he774522_0 conda-forge
pandas 1.0.5 py38he6e81aa_0 conda-forge
pip 20.1.1 py_1 conda-forge
python 3.8.3 cpython_h5fd99cc_0 conda-forge
python-dateutil 2.8.1 py_0 conda-forge
python_abi 3.8 1_cp38 conda-forge
pytz 2020.1 pyh9f0ad1d_0 conda-forge
setuptools 49.2.0 py38h32f6830_0 conda-forge
six 1.15.0 pyh9f0ad1d_0 conda-forge
soupsieve 2.0.1 py38h32f6830_0 conda-forge
sqlite 3.32.3 he774522_1 conda-forge
vc 14.1 h869be7e_1 conda-forge
vs2015_runtime 14.16.27012 h30e32a0_2 conda-forge
webencodings 0.5.1 py_1 conda-forge
wheel 0.34.2 py_1 conda-forge
wincertstore 0.2 py38_1003 conda-forge
I don't know why the packages aren't being recognized by the pandas function. There are multiple other posts that deal with the same problem, but none of the solutions have worked for me.
Example, a few posts like these: Python: ImportError: lxml not found, please install it and
The above answers suggest to use pip3 to install the packages. When I run those commands I get the following info.
pip3 install html5lib
Requirement already satisfied: html5lib in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (1.1)
Requirement already satisfied: six>=1.9 in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (from html5lib) (1.15.0)
Requirement already satisfied: webencodings in c:\users\...\miniconda3\envs\web_scrape\lib\site-packages (from html5lib) (0.5.1)
Any help or references to a similar problem are appreciated!
Thank you!