Im trying to grab values from worldometer.info (similar to post Python: No tables found matching pattern '.+') The code Im using is below:
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}
r = requests.get(url, headers=header)
# fix HTML multiple tbody
soup = BeautifulSoup(r.text, "html.parser")
for body in soup("tbody"):
body.unwrap()
print(soup)
df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
df = df.replace(regex=[r'\+', r'\,'], value='')
df = df.fillna('0')
df = df.to_json(orient='index')
print(df)
And the output is the html of the page and then when pandas processes it I have the error:
Traceback (most recent call last):
File "./covid19_status.py", line 37, in <module>
df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 1101, in read_html
displayed_only=displayed_only,
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 917, in _parse
raise retained
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 898, in _parse
tables = p.parse_tables()
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 217, in parse_tables
tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 563, in _parse_tables
raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern '.+'
Could someone tell me how to resolve this problem? I've tried using the regular expressions from the similar article but could not get it to work and is not included in this code (Im very green with python).
Thanks in advance!