0

First I want to say I am running Python 3.8

I've been getting this ValueError: No tables found matching pattern '.+' from two different websites now, not just 'https://www.worldometers.info/coronavirus/'

here is my code on the first:

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from IPython.display import display, HTML
from datetime import date

today = date.today()
caption = 'Coronavirus Data {}'.format(today)

#HTML To DataFrame

url = 'https://www.worldometers.info/coronavirus/'
r = requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
yesterday = soup.find('div', attrs={'id':'nav-yesterday'})
dfs = pd.read_html(yesterday.prettify(), flavor = 'bs4')[0]
print(dfs)

And the second:

import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup
from IPython.display import display, HTML

url = 'https://www.stadiumsofprofootball.com/comparisons/'
r = requests.get(url)
text = r.text
soup = BeautifulSoup(text)
table = soup.find_all('table')[0]
df = pd.read_html(soup.prettify())

I have verified through .find_all('table') that both tables I am looking to scrape are in fact wrapped in "table /table" so I'm not sure what the issue is.

I'm only making a separate topic on this since I know the worldometers website for coronavirus has topics but none of those solutions are working for me and I'm seeing it on another website as well.

I'm not sure if its a BeautifulSoup error or what is going on.

1 Answers1

1

Doing this should solve your problem.

re.sub(r'<.*?>', lambda g: g.group(0).upper(), <variable to change>)

Here's the full code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

r = requests.get('https://www.worldometers.info/coronavirus/').text

soup = BeautifulSoup(r,"lxml")
yesterday = str(soup.find('div', attrs={'id':'nav-yesterday'}))
# Converted into string because re need string not bytes

yesterday = re.sub(r'<.*?>', lambda g: g.group(0).upper(), yesterday)
# Changing characters inside <> into uppercase

dfs = pd.read_html(yesterday)[0]
print(dfs)

And same for another:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

r = requests.get("https://www.stadiumsofprofootball.com/comparisons/").text
soup = BeautifulSoup(r,"lxml")
table = str(soup.find('table'))
# You don't have to use find_all if you want to select first element only

table = re.sub(r'<.*?>', lambda g: g.group(0).upper(), table)

df = pd.read_html(table)
print(df)

For more information visit here.