I would like to scrape daily COVID-19 Data from the Washington State Department of Health Dashboard (https://www.doh.wa.gov/Emergencies/NovelCoronavirusOutbreak2020COVID19/DataDashboard) using Python.
The site has an embedded Power BI dashboard. Some simple inspection reveals that the site is requesting a specific view from a Power BI site (https://app.powerbigov.us/view?...). This view
argument changes daily as dashboard data is updated. I had been using a simple request.get
to query this address, but I cannot capture the changing view
argument from the Department of Health site with this package alone as the page renders in JavaScript. I have tried the following Selenium Code (Ubuntu, Chromium) but despite my efforts to wait until the relevant iframe is rendered, I get a timeout message:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
target_url = 'http://www.doh.wa.gov/Emergencies/NovelCoronavirusOutbreak2020COVID19/DataDashboard'
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome(options=chrome_options)
driver.get(target_url)
wait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"CovidDashboardFrame")))
TimeoutException: Message: timeout: Timed out receiving message from renderer: 300.000 (Session info: headless chrome=83.0.4103.61)
Without the frame switching, a blank page is returned. I have tested my set up with another site (www.google.com) and am able to retrieve the source code - there is something about this particular site.
Thank you very much for your help.