2

Im trying to grab values from worldometer.info (similar to post Python: No tables found matching pattern '.+') The code Im using is below:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header)

# fix HTML multiple tbody
soup = BeautifulSoup(r.text, "html.parser")
for body in soup("tbody"):
    body.unwrap()

print(soup)

df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
df = df.replace(regex=[r'\+', r'\,'], value='')

df = df.fillna('0')
df = df.to_json(orient='index')

print(df)

And the output is the html of the page and then when pandas processes it I have the error:

Traceback (most recent call last):
  File "./covid19_status.py", line 37, in <module>
    df = pd.read_html(str(soup), index_col=1, thousands=r',', flavor="bs4")[0]
  File "/usr/local/lib64/python3.6/site-packages/pandas/util/_decorators.py", line 296, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 1101, in read_html
    displayed_only=displayed_only,
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 917, in _parse
    raise retained
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 898, in _parse
    tables = p.parse_tables()
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 217, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "/usr/local/lib64/python3.6/site-packages/pandas/io/html.py", line 563, in _parse_tables
    raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern '.+'

Could someone tell me how to resolve this problem? I've tried using the regular expressions from the similar article but could not get it to work and is not included in this code (Im very green with python).

Thanks in advance!

user63339
  • 23
  • 3

1 Answers1

0

You can follow the code provided in the answer to this question. Here is the full code:

import pandas as pd
import requests
import re

url = 'https://www.worldometers.info/coronavirus/#countries'
header = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9","X-Requested-With": "XMLHttpRequest"}

r = requests.get(url, headers=header).text

r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r)

dfs = pd.read_html(r)

dfs[0].to_csv('D:\\Worldometer.csv',index = False)

Screenshot of the CSV File:

enter image description here

Sushil
  • 5,440
  • 1
  • 8
  • 26
  • Thank you much! Yes, this is very helpful. I actually got this to work earlier but Im looking just for the output of the 3rd and 4th column to add as metrics. Would this be very difficult? ++10 – user63339 Oct 15 '20 at 10:18
  • As metrics? What do u mean by that? Could u be more clear? BTW, if my answer has helped you, pls accept it as the best answer by clicking on the tick mark below the upvote button. – Sushil Oct 15 '20 at 10:19
  • With that same code I am still getting an error: Traceback (most recent call last): File "./covid19_status.py", line 31, in r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r) File "/usr/lib64/python3.6/re.py", line 191, in sub return _compile(pattern, flags).sub(repl, string, count) TypeError: expected string or bytes-like object` Also, I wanted to use the BeautifulSoup and pandas to format it for another application. – user63339 Oct 15 '20 at 10:39
  • I did click the uptick arrow but it didn't advance the number??? – user63339 Oct 15 '20 at 10:41
  • And for the error that u get -- change ```r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), r)``` to ```r = re.sub(r'<.*?>', lambda g: g.group(0).upper(), str(r))``` – Sushil Oct 15 '20 at 10:50