0

I am looking at this web site.

https://coronavirus.jhu.edu/map.html

enter image description here

It seems like the class is either 'flex-fluid list-item-content overflow-hidden ' (with presumably a space at the end) or 'external-html'

import requests
import pandas as pd
from bs4 import BeautifulSoup

r = requests.get("https://coronavirus.jhu.edu/map.html")
soup = BeautifulSoup(r.content)
mydivs = soup.findAll("a", {"class": "external-html"})
print(mydivs)
df = pd.DataFrame(mydivs)
df

mydivs = soup.find_all("div", {"class": "flex-fluid list-item-content overflow-hidden "})
print(mydivs)
df = pd.DataFrame(mydivs)
df

When I run the sample code below, I get nothing returned to 'mydivs'. I just get a blank bs4.element.ResultSet. I also checked for tables on this site, and I found none, so I'm thinking all the numbers under 'Cases by Country/Region/Sovereignty' must be contained in div classes. Basically, I'd like to get all the numbers organized nicely, in a data frame. What am I doing wrong?

I'm also using this link as a reference.

https://www.codegrepper.com/code-examples/python/beautifulsoup+find+all+div+class

ASH
  • 20,759
  • 19
  • 87
  • 200
  • The data is injected dynamically using JavaScript. There's an API they're using but looks like a 403 if you try to hit it. You could try Selenium or Pypeteer. – ggorlen Mar 28 '21 at 04:20
  • I don't need that data. I just though I missed something and I wanted to see if I understand the concept. So, BeautifulSoup is helpful for gabbing static HTML and Selenium is useful for grabbing dynamically created content, like JavaScript. Is that what it boils down to? – ASH Mar 28 '21 at 13:22
  • Pretty much, yep – ggorlen Mar 28 '21 at 16:27
  • 1
    Does this answer your question? [Web-scraping JavaScript page with Python](https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) – ggorlen Mar 28 '21 at 16:27
  • I bookmarked that page! Thanks!! – ASH Mar 28 '21 at 20:25

0 Answers0