1

while using pandas to read an html file's data....Here is my code:

import pandas as pd 
import geopandas as gpd
import requests

url=requests.get("https://www.worldometers.info/coronavirus/")
dataframe=pd.read_html(url.text)
print(dataframe)    

i got an ValueError showing No tables found matching pattern '.+' here is the error:

 C:/Users/mayank/AppData/Local/Programs/Python/Python38-32/python.exe e:/skills/mayankvscod
e/projects/coronavirus_worldometer/corona_meter.py
Traceback (most recent call last):
  File "e:/skills/mayankvscode/projects/coronavirus_worldometer/corona_meter.py", line 6, in
<module>
    dataframe=pd.read_html(url.text)
  File "C:\Users\mayank\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\html.py", line 1085, in read_html
    return _parse(
  File "C:\Users\mayank\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\html.py", line 915, in _parse
    raise retained
  File "C:\Users\mayank\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\html.py", line 895, in _parse
    tables = p.parse_tables()
  File "C:\Users\mayank\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\html.py", line 213, in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
  File "C:\Users\mayank\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\html.py", line 561, in _parse_tables
    raise ValueError(f"No tables found matching pattern {repr(match.pattern)}")
ValueError: No tables found matching pattern '.+'
mayank choudhary
  • 63
  • 1
  • 4
  • 12

4 Answers4

8

Maybe there's better solutions, but this is what worked for me - convert all <...> tags to uppercase (maybe a bug in pandas?):

import re
import requests

html_source = requests.get("https://www.worldometers.info/coronavirus/").text
html_source = re.sub(r'<.*?>', lambda g: g.group(0).upper(), html_source)

dataframe=pd.read_html(html_source)
print(dataframe)

Prints:

[      #  Country,Other  TotalCases NewCases  TotalDeaths NewDeaths  TotalRecovered  ... TotalTests  Tests/ 1M pop  Population          Continent  1 Caseevery X ppl  1 Deathevery X ppl  1 Testevery X ppl
0   NaN  North America     2333333   +5,613     138581.0      +642        966689.0  ...        NaN            NaN         NaN      North America                NaN                 NaN                NaN
1   NaN  South America     1219723     +695      52857.0       +12        563854.0  ...        NaN            NaN         NaN      South America                NaN                 NaN                NaN
2   NaN         Europe     2134390   +9,695     180463.0      +279       1117440.0  ...        NaN            NaN         NaN             Europe                NaN                 NaN                NaN
3   NaN           Asia     1440771  +16,501      36529.0      +232        871289.0  ...        NaN            NaN         NaN               Asia                NaN                 NaN                NaN
4   NaN         Africa      206520   +1,000       5578.0       +10         93197.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
..   ..            ...         ...      ...          ...       ...             ...  ...        ...            ...         ...                ...                ...                 ...                ...
226 NaN         Total:     1440771  +16,501      36529.0      +232        871289.0  ...        NaN            NaN         NaN               Asia                NaN                 NaN                NaN
227 NaN         Total:      206520   +1,000       5578.0       +10         93197.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
228 NaN         Total:        8887       +9        124.0       NaN          8332.0  ...        NaN            NaN         NaN  Australia/Oceania                NaN                 NaN                NaN
229 NaN         Total:         721      NaN         15.0       NaN           651.0  ...        NaN            NaN         NaN                NaN                NaN                 NaN                NaN
230 NaN         Total:     7344345  +33,513     414147.0    +1,175       3621452.0  ...        NaN            NaN         NaN                All                NaN                 NaN                NaN

[231 rows x 19 columns],       #  Country,Other  TotalCases  NewCases  TotalDeaths NewDeaths  TotalRecovered  ... TotalTests  Tests/ 1M pop  Population          Continent  1 Caseevery X ppl  1 Deathevery X ppl  1 Testevery X ppl
0   NaN           Asia     1424270   +31,788      36297.0      +658        864232.0  ...        NaN            NaN         NaN               Asia                NaN                 NaN                NaN
1   NaN  North America     2327720   +24,384     137939.0    +1,554        963408.0  ...        NaN            NaN         NaN      North America                NaN                 NaN                NaN
2   NaN  South America     1219028   +42,752      52845.0    +1,518        563823.0  ...        NaN            NaN         NaN      South America                NaN                 NaN                NaN
3   NaN         Europe     2124695   +14,928     180184.0      +824       1105422.0  ...        NaN            NaN         NaN             Europe                NaN                 NaN                NaN
4   NaN         Africa      205520    +6,530       5568.0      +178         92914.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
..   ..            ...         ...       ...          ...       ...             ...  ...        ...            ...         ...                ...                ...                 ...                ...
226 NaN         Total:     2124695   +14,928     180184.0      +824       1105422.0  ...        NaN            NaN         NaN             Europe                NaN                 NaN                NaN
227 NaN         Total:      205520    +6,530       5568.0      +178         92914.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
228 NaN         Total:        8878        +3        124.0       NaN          8308.0  ...        NaN            NaN         NaN  Australia/Oceania                NaN                 NaN                NaN
229 NaN         Total:         721       NaN         15.0       NaN           651.0  ...        NaN            NaN         NaN                NaN                NaN                 NaN                NaN
230 NaN         Total:     7310832  +120,385     412972.0    +4,732       3598758.0  ...        NaN            NaN         NaN                All                NaN                 NaN                NaN

[231 rows x 19 columns],       #  Country,Other  TotalCases  NewCases  TotalDeaths NewDeaths  TotalRecovered  ... TotalTests  Tests/ 1M pop  Population          Continent  1 Caseevery X ppl  1 Deathevery X ppl  1 Testevery X ppl
0   NaN           Asia     1392482   +31,253      35639.0      +614        842914.0  ...        NaN            NaN         NaN               Asia                NaN                 NaN                NaN
1   NaN  North America     2303336   +24,856     136385.0      +848        943277.0  ...        NaN            NaN         NaN      North America                NaN                 NaN                NaN
2   NaN  South America     1176276   +29,806      51327.0    +1,105        559961.0  ...        NaN            NaN         NaN      South America                NaN                 NaN                NaN
3   NaN         Europe     2109767   +14,920     179360.0      +417       1088915.0  ...        NaN            NaN         NaN             Europe                NaN                 NaN                NaN
4   NaN         Africa      198990    +6,872       5390.0      +173         88212.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
..   ..            ...         ...       ...          ...       ...             ...  ...        ...            ...         ...                ...                ...                 ...                ...
226 NaN         Total:     2109767   +14,920     179360.0      +417       1088915.0  ...        NaN            NaN         NaN             Europe                NaN                 NaN                NaN
227 NaN         Total:      198990    +6,872       5390.0      +173         88212.0  ...        NaN            NaN         NaN             Africa                NaN                 NaN                NaN
228 NaN         Total:        8875        +5        124.0       NaN          8294.0  ...        NaN            NaN         NaN  Australia/Oceania                NaN                 NaN                NaN
229 NaN         Total:         721       NaN         15.0       NaN           651.0  ...        NaN            NaN         NaN                NaN                NaN                 NaN                NaN
230 NaN         Total:     7190447  +107,712     408240.0    +3,157       3532224.0  ...        NaN            NaN         NaN                All                NaN                 NaN                NaN

[231 rows x 19 columns]]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • but the same(my) code is working in spyder....why this is so??.......without using re package it is still working in spyder editor....why?.....@Andrej Kesely – mayank choudhary Jun 10 '20 at 12:03
  • @mayankchoudhary I tried different parsers (`flavor=` in `.read_html()` method), but to no avail. That's why I suspect bug in pandas (but I can be wrong, I've seen it only on this site `www.worldometers.info`) – Andrej Kesely Jun 10 '20 at 12:11
2

The only thing that solved this was to use this answer from TutorialLink and add displayed_only=False to read_html:

df = pd.read_html(str(table), displayed_only=False)[0]
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 03 '21 at 05:54
2

Using you can scrape table within worldometers covid data using DataFrame from Pandas and the following Locator Strategy:

Code Block:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd

options = Options()
options.add_argument("start-maximized")
s = Service('C:\\BrowserDrivers\\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://www.worldometers.info/coronavirus/")
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table#main_table_countries_today"))).get_attribute("outerHTML")
df  = pd.read_html(data)
print(df)
driver.quit()

Console Output:

[         # Country,Other  TotalCases  NewCases  ...  Deaths/1M pop   TotalTests  Tests/ 1M pop    Population
0      NaN         World   264359298  632349.0  ...          673.3          NaN            NaN           NaN
1      1.0           USA    49662381   89259.0  ...         2415.0  756671013.0      2267182.0  3.337495e+08
2      2.0         India    34609741    3200.0  ...          336.0  643510926.0       459914.0  1.399198e+09
3      3.0        Brazil    22118782   12910.0  ...         2865.0   63776166.0       297051.0  2.146975e+08
4      4.0            UK    10329074   53945.0  ...         2124.0  364875273.0      5335159.0  6.839070e+07
..     ...           ...         ...       ...  ...            ...          ...            ...           ...
221  221.0         Samoa           3       NaN  ...            NaN          NaN            NaN  2.002800e+05
222  222.0  Saint Helena           2       NaN  ...            NaN          NaN            NaN  6.103000e+03
223  223.0    Micronesia           1       NaN  ...            NaN          NaN            NaN  1.167290e+05
224  224.0         Tonga           1       NaN  ...            NaN          NaN            NaN  1.073890e+05
225    NaN        Total:   264359298  632349.0  ...          673.3          NaN            NaN           NaN

[226 rows x 15 columns]]
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

So i noticed that while using Selenium WebDriverWait and expected_conditions "presence_of_element_located" the table in my html still had a property of display: none and Pandas kept giving

No tables found matching pattern '.+'

Following @Andrej Kesely answer using "re.sub" worked for me. But after observing @spirastarez's answer, passing "displayed_only=False" also worked.

pd.read_html(table_html, displayed_only=False)

Then @undetected Selenium's answer made me realize the actual issue. After changing my expected_conditions to "visibility_of_element_located" instead of "presence_of_element_located" pandas was able to work without using re.sub or displayed_only=False solution.

Working solution for those using Selenium and Pandas below:

table = WebDriverWait(browser, 30).until(EC.visibility_of_element_located((By.ID, 'table_1')))
table_html = table.get_attribute('outerHTML')
df = pd.read_html(table_html)
OBAA
  • 425
  • 4
  • 6