0

So I have data in a huge messy json file, after some formating I've got to down to 1 dict per line that I can read in.

My goal is that for each starting dict I output one row of a dataframe (Some columns are repeated from dict to dict, and some dicts add new columns).

Example of dict format:

dic = {'name': 'Simon',
       'salary': 25000,
       'children': ['Sally', 'Stuart', 'Paul'],
       'assets' : [{'houses' : ['50 cool drive', '60 swag lane']}, {'vehicles' : {'cars': 
                                                                               ['bmw', 
                                                                                'kia'],
                                                                               'boats': 
       ...}}]}

Example output:

    name     salary  children               houses          vehicles      cars      boats
0   Simon    25000   Sally, Stuart, Paul    50 cool dr...   cars, boats   bmw, kia  ...

How do I account for the changing structure from dict to list, to dict again etc.

I've tried something like:

run(thing):
    if type(thing) is not dict/list:
        df[thing] = df.get(thing)
    else:
        run(thing)

Also how do I account for when its a list within a list that has no 'column name' that I can append to the df?

I can get everything I want by continually looping and handling everything case by case, but is there not a more pythonic way to do this?

Thanks

liamod
  • 316
  • 1
  • 9
  • https://stackoverflow.com/questions/1305532/convert-nested-python-dict-to-object?rq=1 Does this answer your question for getting a regular dictionary and then converting to dataframe? – Josh Zwiebel Apr 06 '20 at 14:20
  • do u have an idea of the keys? are they fixed? – sammywemmy Apr 06 '20 at 14:22
  • Most of the keys are the same for each dict but sometimes there are new ones (more or less) so its hard to say. – liamod Apr 06 '20 at 14:26
  • 1
    You can try this https://stackoverflow.com/questions/60984799/normalize-a-complex-nested-json-file/60985664#60985664 to flatten the json and/or as suggested by @sammywemmy use `jmespath` to help with nested lists. – Raphaele Adjerad Apr 07 '20 at 06:41
  • I would accept this as the answer, elegant solution to my problem, thanks! – liamod Apr 08 '20 at 11:24

0 Answers0