How to transform nested json with dynamic keys and arays to pandas dataframe?
- Static keys:
data
,label
,units
,date
,val
,num
(can be hardcoded) - Dynamic keys/arrays:
data_1_a
,data_1000_xyz
,name_1a
,name_1b
,name_10000_xyz
,A
,B
(cannot be hardcoded as they are up to 10000 names / data sub categories)
For solutions I tried please see below useful links.
input json:
{
"id": 1,
"data": {
"data_1_a": {
"name_1a": {
"label": "label_1",
"units": {
"A": [{"date": 2020, "val": 1}]}}
},
"data_1000_xyz": {
"name_1b": {
"label": "null",
"units": {
"B": [{"date": 2019, "val": 2},
{"date": 2020, "val": 3}]},
},
"name_10000_xyz": {
"label": "null",
"units": {
"A": [
{"date": 2018, "val": 4, "num": "str"},
{"date": 2019, "val": 5},
{"date": 2020, "val": 6, "num": "str"},
]
},
},
},
},
}
required output df:
+---+--------------+----------------+---------+-------+------+-----+------+
|id |level_1 |level_2 |level_3 |level_4| date | val | num |
+---+--------------+----------------+---------+-------+------+-----+------+
|1 |data_1_a | name_1a | unit | A | 2020 | 1 | null |
|1 |data_1000_xyz | name_1b | unit | B | 2019 | 2 | null |
|1 |data_1000_xyz | name_1b | unit | B | 2020 | 3 | null |
|1 |data_1000_xyz | name_10000_xyz | unit | A | 2018 | 4 | str |
|1 |data_1000_xyz | name_10000_xyz | unit | A | 2019 | 5 | null |
|1 |data_1000_xyz | name_10000_xyz | unit | A | 2020 | 6 | str |
+-------------------------------------------------------------------------+
Usefull links: