0

I'm trying to flatten deeply nested json files.

I have 22 json files which i want to gather in one pandas dataframe. I managed to flatten them with json_normalize to the second level, but I am not able to parse it further. Sometimes the jsons have more than 5 levels.

I want to extract the _id, the actType and all the text data which is located in the different levels of "children". Example of the Json file follows. Really appreciate your help!

{
    "_id": "test1",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}
import pandas as pd

# load file
df = pd.read_json('test.json')

# display(df)
     _id  actType                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   entries
0  test1  FINDING  {'text': 'U Ergebnis:', 'isDocumentationNode': False, 'children': [{'text': 'U3: Standartext', 'isDocumentationNode': True, 'children': []}, {'text': 'Brückner durchgeführt o.p.B.', 'isDocumentationNode': True, 'children': []}, {'text': 'Normale körperliche und altersgerecht Entwicklung', 'isDocumentationNode': True, 'children': [{'text': 'J1/2', 'isDocumentationNode': False, 'children': [{'text': 'Schule:', 'isDocumentationNode': True, 'children': [{'text': 'Ziel Abitur', 'isDocumentationNode': True, 'children': [{'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}, {'text': 'läuft', 'isDocumentationNode': True, 'children': []}, {'text': 'gefährdet', 'isDocumentationNode': True, 'children': []}]}]}]}]}]}
  • This results in a nested dict in the 'entries' column, but I need a flat, wide dataframe, with all keys as columns.
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Janalytics
  • 53
  • 5

1 Answers1

-1
  • Use the flatten_json function, as described in SO: How to flatten a nested JSON recursively, with flatten_json?
    • This will flatten each JSON file wide.
    • This function recursively flattens nested JSON files.
    • Copy the flatten_json function from the linked SO question.
  • Use pandas.DataFrame.rename, to rename any columns, as needed.
import json
import pandas as pd

# list of files
files = ['test1.json', 'test2.json']

# list to add dataframe from each file
df_list = list()

# iterate through files
for file in files:
    with open(file, 'r', encoding='utf-8') as f:

        # read with json
        data = json.loads(f.read())

        # flatten_json into a dataframe and add to the dataframe list
        df_list.append(pd.DataFrame.from_dict(flatten_json(data), orient='index').T)
        
# concat all dataframes together
df = pd.concat(df_list).reset_index(drop=True)

# display(df)
     _id  actType entries_0_text entries_0_isDocumentationNode entries_0_children_0_text entries_0_children_0_isDocumentationNode     entries_0_children_1_text entries_0_children_1_isDocumentationNode                          entries_0_children_2_text entries_0_children_2_isDocumentationNode entries_0_children_2_children_0_text entries_0_children_2_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_text entries_0_children_2_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_0_text entries_0_children_2_children_0_children_0_children_0_children_0_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_1_text entries_0_children_2_children_0_children_0_children_0_children_1_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_2_text entries_0_children_2_children_0_children_0_children_0_children_2_isDocumentationNode entries_0_children_2_children_0_children_0_children_0_children_3_text entries_0_children_2_children_0_children_0_children_0_children_3_isDocumentationNode
0  test1  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True
1  test2  FINDING    U Ergebnis:                         False           U3: Standartext                                     True  Brückner durchgeführt o.p.B.                                     True  Normale körperliche und altersgerecht Entwicklung                                     True                                 J1/2                                               False                                         Schule:                                                           True                                                Ziel Abitur                                                                      True                                                                 läuft                                                                                 True                                                             gefährdet                                                                                 True                                                                   NaN                                                                                  NaN                                                                   NaN                                                                                  NaN

Data

  • test1.json
{
    "_id": "test1",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}

  • test2.json
{
    "_id": "test2",
    "actType": "FINDING",
    "entries": [{
            "text": "U Ergebnis:",
            "isDocumentationNode": false,
            "children": [{
                    "text": "U3: Standartext",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Brückner durchgeführt o.p.B.",
                    "isDocumentationNode": true,
                    "children": []
                }, {
                    "text": "Normale körperliche und altersgerecht Entwicklung",
                    "isDocumentationNode": true,
                    "children": [{
                            "text": "J1/2",
                            "isDocumentationNode": false,
                            "children": [{
                                    "text": "Schule:",
                                    "isDocumentationNode": true,
                                    "children": [{
                                            "text": "Ziel Abitur",
                                            "isDocumentationNode": true,
                                            "children": [{
                                                    "text": "läuft",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }, {
                                                    "text": "gefährdet",
                                                    "isDocumentationNode": true,
                                                    "children": []
                                                }
                                            ]
                                        }
                                    ]
                                }
                            ]
                        }
                    ]
                }
            ]
        }
    ]

}

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158