Use json_normalize to normalize json with nested arrays

Question

I would like to normalize the following JSON:

[
    {
        "studentId": 1,
        "studentName": "James",
        "schools": [
            {
                "schoolId": 1,
                "classRooms": [
                    {
                        "classRoomId": {
                            "id": 1,
                            "floor": 2
                        }
                    },
                    {
                        "classRoomId": {
                            "id": 3
                        }
                    },
                ],
                "teachers": [
                    {
                        "teacherId": 1,
                        "teacherName": "Tom"
                    },
                    {
                        "teacherId": 2,
                        "teacherName": "Sarah"
                    }
                ]
            },
            {
                "schoolId": 2,
                "classRooms": [
                    {
                        "classRoomId": {
                            "id": 4
                        }
                    }
                ],
                "teachers": [
                    {
                        "teacherId": 1,
                        "teacherName": "Tom"
                    },
                    {
                        "teacherId": 2,
                        "teacherName": "Sarah"
                    },
                    {
                        "teacherId": 3,
                        "teacherName": "Tara"
                    }
                ]
            }
        ]
    }
]

And I would like to get the following table in Python (tabular form):

studentId studentName schoolId classRoomId.id classRoomId.floor teacherId 
teacherName
1 James 1 1 2 1 Tom
1 James 1 1 2 2 Sarah
1 James 1 3   1 Tom
1 James 1 3   2 Sarah
1 James 2 4   1 Tom
1 James 2 4   2 Sarah
1 James 2 4   3 Tara

I've tried to use Pandas json_normalize function like this:

df1 = json_normalize(test1, ["schools","teachers"], ["studentId", "studentName",["schools","teachers"]])
df2 = json_normalize(test1, ["schools","classRooms"], ["studentId", "studentName",["schools","classRooms"]])
df = pd.concat([df1,df2],axis=1)

But that doesn't give me the structure I need.

It doesn't have to be in Pandas, any other library or code in Python would do. Any help is appreciate it. Thank you.

Does this question help? https://stackoverflow.com/questions/53643406/normalize-json-using-python — alexbclay, Aug 10 '19 at 01:36

Code Different · Accepted Answer · 2019-08-12T14:14:01.630

4

Because classRooms and teachers form two different subtrees of the JSON, you will have to parse them twice:

classrooms = pd.io.json.json_normalize(json_data, ['schools', 'classRooms'], meta=[
    'studentId',
    'studentName',
    ['schools', 'schoolId']
])

teachers = pd.io.json.json_normalize(json_data, ['schools', 'teachers'], meta=[
    'studentId',
    ['schools', 'schoolId']
])

# Merge and rearrange the columns in the order of your sample output
classrooms.merge(teachers, on=['schools.schoolId', 'studentId']) \
    [['studentId', 'studentName', 'schools.schoolId', 'classRoomId.id', 'classRoomId.floor', 'teacherId', 'teacherName']]

edited Aug 12 '19 at 14:14

answered Aug 10 '19 at 02:28

Code Different

90,614
16
144
163

Perfect, that gave me the exact result. Thank you very much. Now I just need to come up with an algorithm to do it more generically. It seems like there should be something out there that would do this in a generic way. – Esfandiar Aug 12 '19 at 14:12
2

@Esfandiar, did you ever come up with a solution? I would agree, seems like this must be well established territory. – ghukill Nov 08 '22 at 14:45

Use json_normalize to normalize json with nested arrays

1 Answers1

Linked