0

I'm trying to gather some data from a table on a web page with Python and Beautiful Soup. When I make a selection from the page, however, I'm getting different results than I get in the browser. Specifically, the tables are missing completely. Here's a screenshot of the table in the inspector of Firefox dev tools:

Screenshot of web page and inspector

And here's the output that I get from Beautiful Soup:

Screenshot of IDE with output

I've tried using urllib instead of requests, and I've tried using different HTML parsers, (html.parser and lxml). All give the same results. Any advice on what might be happening here and how I might get around it to access the data from the table?

import requests
from bs4 import BeautifulSoup
import pandas
import tabula
import html5lib

knox = requests.get("https://covid.knoxcountytn.gov/case-count.html")
knox_soup = BeautifulSoup(knox.text, 'html5lib')
knox_confirmed = knox_soup.find('div', id='covid_cases').prettify()

print(knox_confirmed)
LuosRestil
  • 169
  • 3
  • 12
  • 2
    please [edit] your question and include your code as a `code` instead of `img`, so we can manually check and verify – αԋɱҽԃ αмєяιcαη Apr 24 '20 at 21:52
  • 1
    Chances are that thetable is being populated using javascript which makes further ajaz calls to get the table content.This javascript isn’t (can’t be) executed when you retrieve it with `requests` - you’ll probably have to use a browser-simulation like selectium which can execute javascript, so you’ll possibly be able to collect the table Good luck! – DisappointedByUnaccountableMod Apr 24 '20 at 21:54
  • And yes don’t put images of code/text into a question - paste the text. – DisappointedByUnaccountableMod Apr 24 '20 at 21:54

2 Answers2

1

Try to disable javascript when you visit https://covid.knoxcountytn.gov/case-count.html and you will see no table. As @barny said the table is generated with javascript so you can't parse it with BeautifulSoup (at least not easily, see How to call JavaScript function using BeautifulSoup and Python).

Xetolone
  • 357
  • 1
  • 12
  • 1
    The post you linked was exactly what I needed! Using requests-html to render the page's JavaScript got me all the info I needed. Thanks! – LuosRestil Apr 25 '20 at 00:39
0

Website is loaded via JavaScript, so you can't use requestes to render the JS for you. You can use selenium or requests_html etc.

As for now, I've been able to track from where the data is fetched. by checking the XHR traffic been made.

So we can use pandas.read_csv() as the following:

import pandas as pd

df = pd.read_csv("https://covid.knoxcountytn.gov/includes/covid_cases.csv")

print(df)

enter image description here

  • That's a brilliant solution. I never even considered that I could fetch the data directly from the same source the website does. Thanks for the advice! – LuosRestil Apr 25 '20 at 00:45