So I have data in a huge messy json file, after some formating I've got to down to 1 dict per line that I can read in.
My goal is that for each starting dict I output one row of a dataframe (Some columns are repeated from dict to dict, and some dicts add new columns).
Example of dict format:
dic = {'name': 'Simon',
'salary': 25000,
'children': ['Sally', 'Stuart', 'Paul'],
'assets' : [{'houses' : ['50 cool drive', '60 swag lane']}, {'vehicles' : {'cars':
['bmw',
'kia'],
'boats':
...}}]}
Example output:
name salary children houses vehicles cars boats
0 Simon 25000 Sally, Stuart, Paul 50 cool dr... cars, boats bmw, kia ...
How do I account for the changing structure from dict to list, to dict again etc.
I've tried something like:
run(thing):
if type(thing) is not dict/list:
df[thing] = df.get(thing)
else:
run(thing)
Also how do I account for when its a list within a list that has no 'column name' that I can append to the df?
I can get everything I want by continually looping and handling everything case by case, but is there not a more pythonic way to do this?
Thanks