0

I'm trying to get web data from a website, and I only need to grab inner html data from a tbody class and convert it into json for better control as well as to save the data into a file later on. I've only managed to read each element by using find_element(By.XPATH) from selenium. Is there any way to read the whole innter html tbody content then parse it to json? requests wont work since it's inside an iframe.
The website and the tbody is the scroll table with title :"Tình hình dịch cả nước", I only want the table minus the title, and the header of the table if possible.
The code for reading an element:

browser=webdriver.Firefox()
browser.get("https://covid19.gov.vn/")
time.sleep(3)
browser.switch_to.frame(browser.find_element(By.XPATH,'/html/body/div[1]/div[2]/div[3]/div/iframe'))
value=browser.find_element(By.XPATH,'/html/body/div[2]/div[1]/div/div[2]/div[1]/span[4]')
print(value.text)
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Dư Huy
  • 65
  • 7
  • 1
    There is a good answer at https://stackoverflow.com/questions/38917958/convert-html-into-csv . It uses BeautifulSoup to extract data from HTML, which is what you probably want to do. – Amitai Irron Dec 24 '21 at 15:49
  • @AmitaiIrron thank you for your suggestion. But I did find an alternative that's much shorter which I will edit in my original post – Dư Huy Dec 24 '21 at 16:40

1 Answers1

1

Just call the same endpoint the page does which returns JSON.

import requests
import pandas as pd

r = requests.get('https://static.pipezero.com/covid/data.json').json()
location_json = r['locations']
df = pd.DataFrame(location_json)
print(df)
QHarr
  • 83,427
  • 12
  • 54
  • 101