1

Dears, I need to transform the Covid hospitalisation json data from the government webpage: https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization

I inspect the webpage and identified the table in the below-showed html code.

I used the following Python code and got the outcome below:

import bs4 as bs
import urllib.request
import json

source = urllib.request.urlopen("https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization")
soup = bs.BeautifulSoup(source)
js_test = soup.find("div", id="js-hospitalization-table-data")

#Convert to JSON object
jsonData = json.loads(js_test.attrs["data-table"])   
print (jsonData['body'])

Thank you.

Jara
  • 13
  • 4
  • What's a "data-table"? – martineau Dec 21 '20 at 18:10
  • i ´ve thought the .csv file ... – Jara Dec 21 '20 at 18:20
  • From the output you're getting, it looks like `data-table` is in JSON format, so you would need to convert data in that format into CSV — which may or may not be possible because the latter doesn't support nested data structures while the other does. – martineau Dec 21 '20 at 18:31
  • i tried to import this json data into the xls or Power BI, but without any success. so my idea was to extract the text behind "body": and transform the data in [ xxx ] into the .csv file .. any idea? thank you – Jara Dec 21 '20 at 18:59
  • Sorry, I don't know how to use beautifulsoup, but if you can get just the value of `data-table` from it or what it's returning, then I might be able to help convert it into CSV format. – martineau Dec 21 '20 at 19:36
  • ufff. here it is: – Jara Dec 21 '20 at 21:16
  • import bs4 as bs import urllib.request import json source = urllib.request.urlopen("https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization") soup = bs.BeautifulSoup(source) js_test = soup.find("div", id="js-hospitalization-table-data") jsonData = json.loads(js_test.attrs["data-table"]) #Convert to JSON Object. print (jsonData['body']) – Jara Dec 21 '20 at 21:17
  • Please add code to your question. – martineau Dec 21 '20 at 21:17
  • pls. see above or rather below – Jara Dec 21 '20 at 21:36
  • Sorry, as I said, I not familar with beautifulsoup (or even have it installed). – martineau Dec 21 '20 at 21:40
  • ok .thank you. you anyway, help a lot to move furhter – Jara Dec 21 '20 at 21:43

1 Answers1

1

The data you want is in JSON format, you can convert it to a Python dictionary (dict) and get the data under the body key using the built-in json module.

import json
import bs4 as bs
import urllib.request

source = urllib.request.urlopen(
    "https://onemocneni-aktualne.mzcr.cz/covid-19#panel3-hospitalization"
)
soup = bs.BeautifulSoup(source, "html.parser")

json_data = json.loads(
    soup.find("div", id="js-hospitalization-table-data")["data-table"]
)

print(type(json_data))
print(*json_data["body"])

Output (partial):

<class 'dict'>
['01.03.2020', 0, 0, 0, 0, 0] ['02.03.2020', 0, 0, 0, 0, 0] ... ['20.12.2020', 4398, 588, 0.1337, 34796, 0.7152]
MendelG
  • 14,885
  • 4
  • 25
  • 52
  • Thank you very much. However, I have not found out how to transform this dictionary into the table or dataframe. could you please add this for me? thank you – Jara Dec 21 '20 at 21:44
  • @Jara That's a different question. Please see [How to convert JSON File to Dataframe](https://stackoverflow.com/questions/41168558/python-how-to-convert-json-file-to-dataframe). If you are still stuck, consider asking a new question here on Stackoverflow. – MendelG Dec 21 '20 at 22:02