0

I want to grep the total number of deaths from the Johns Hopkins Covid dashboard. I want to do this using Selenium, Python and Selenium’s chrome driver. The number of deaths can be found under the path //*[@id="ember1915"]/svg/g[2]/svg/text.

enter image description here

This is my script:

from selenium.webdriver import Chrome
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

with Chrome() as driver:
    driver.get('https://coronavirus.jhu.edu/map.html')
    driver.implicitly_wait(20) # Waits for 20 s for the entire page to loads.
    

    diplayElement = driver.find_element_by_xpath('//*[@id="ember1915"]/svg/g[2]/svg/text')

It fails with the error “no such element:

Unable to locate element: {"method":"xpath","selector":"//*[@id="ember1915"]/svg/g[2]/svg/text"}”.

This also happens for other sites I’m trying to scrape.

How can I fix this? What’s the reason for this error?

kadamb
  • 1,532
  • 3
  • 29
  • 55
Lucy
  • 23
  • 2

1 Answers1

0

The element with the total number of deaths i.e. 905,181 from the Johns Hopkins Covid dashboard is within an <iframe> so you have to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for visibility_of_element_located() and you can use either of the following Locator Strategies:

    • Using XPATH and get_attribute():

      driver.get('https://coronavirus.jhu.edu/map.html')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE']")))
      print(WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//*[name()='svg']/*[name()='text' and text()='Global Deaths']//following::div[1]/*[name()='svg' and @class='responsive-text-group']//*[name()='g' and @class='responsive-text-label']/*[name()='svg']/*[name()='text']"))).get_attribute("innerHTML"))
      
    • Using XPATH and text attribute:

      driver.get('https://coronavirus.jhu.edu/map.html')
      WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.XPATH,"//iframe[@title='Coronavirus COVID-19 Global Cases by Johns Hopkins CSSE']")))
      print(WebDriverWait(driver, 60).until(EC.visibility_of_element_located((By.XPATH, "//*[name()='svg']/*[name()='text' and text()='Global Deaths']//following::div[1]/*[name()='svg']//*[name()='g']/*[name()='svg']/*[name()='text']"))).text)
      
  • Console Output:

    905,181
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python


Reference

You can find a couple of relevant discussions in:

undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • How are you sure that it will return text or inner html without selecting an iframe ? I would like to know even without selecting the shadow element how it will do this. – Dev Sep 10 '20 at 17:02
  • @Dev Nice catch, just slipped out while copying the code, corrected it now. – undetected Selenium Sep 10 '20 at 17:40