Flatten Json with Pandas (multiple lists)

Question

I want to use a json file to return a pandas dataframe that has all the data listed out in everyrow. The json file is as per below.

{
  "building_element_group": [
    {
      "basetype": "facade",
      "building_element": [
        {
          "type": "Unitised",
          "functional_unit": "m2",
          "quantity": 5.74,
          "element": [
            {
              "id": "13d22d3b-7fc6-4116-93ad-80c139e006dc",
              "type": "glazing",
              "quantity_unit": "m2",
              "quantity": 3.29,
              "material": [
                {
                  "type": "glass",
                  "impact_data_ID": "5726d14e-d36e-417d-afc4-c70793080186",
                  "quantity_unit": "m2/m2",
                  "quantity": 1
                }
              ]
            },
            {
              "id": "045d27e6-8397-4672-9f4a-6cbc5fe4e716",
              "type": "cladding",
              "quantity_unit": "m2",
              "quantity": 6.27,
              "material": [
                {
                  "type": "terracotta",
                  "impact_data_ID": "529d8876-6adb-449c-a12a-74c56aaadc4f",
                  "quantity_unit": "m/m2",
                  "quantity": 0.04
                },
                {
                  "type": "brick",
                  "impact_data_ID": "e28d29a9-38f8-4684-a6b1-0615ac7f66e5",
                  "quantity_unit": "m/m2",
                  "quantity": 0.06
                },
                {
                  "type": "GRC",
                  "impact_data_ID": "5043ffe6-9d2e-448e-83ed-f36f1f5decfc",
                  "quantity_unit": "m/m2",
                  "quantity": 0.025
                },
                {
                  "type": "Fiber cement",
                  "impact_data_ID": "53bbd2be-f9ac-4ee7-88f3-34df68ee5187",
                  "quantity_unit": "m/m2",
                  "quantity": 0.013
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

I have then loaded in the file, above and done the following:

test = pd.json_normalize(df['building_element_group'],
record_path= ['building_element', 'element', 'material'], 
meta = ['basetype', 
['building_element','quantity'], 
['building_element','type'], 
['building_element','element', 'quantity_unit'], 
['building_element','element', 'type']], 
errors='ignore', sep='-')

What I want to do, is to be able to show all json data in each row, so all the nested data. I have used the Meta to do this, but I have to manually enter in all the branches that I need. Is there a way to do this, so I dont need to manually do this?

Does this answer your question? [flattening nested Json in pandas data frame](https://stackoverflow.com/questions/52795561/flattening-nested-json-in-pandas-data-frame) — JonSG, Jan 25 '23 at 15:38

Parfait · Answer 1 · 2023-01-26T22:13:17.427

Consider wrangling with a list/dictionary comprehension that includes merging dictionaries at each level then pass result in DataFrame constructor:

import json
import pandas as pd

with open("BuildingElementMaterial.json") as f:
    data = json.load(f)

pd_data = [
   {
     **{f"group_{k}":v for k,v in g.items() if k != "building_element"},
     **{f"building_{k}":v for k,v in b.items() if k != "element"},
     **{f"element_{k}":v for k,v in e.items() if k != "material"},
     **{f"material_{k}":v for k,v in m.items()}
   }
   for g in data["building_element_group"]
   for b in g["building_element"]
   for e in b["element"]
   for m in e["material"]   
]

material_df = pd.DataFrame(pd_data)

Output

Dictionary

print(pd_data)
[
 {'building_functional_unit': 'm2',
  'building_quantity': 5.74,
  'building_type': 'Unitised',
  'element_id': '13d22d3b-7fc6-4116-93ad-80c139e006dc',
  'element_quantity': 3.29,
  'element_quantity_unit': 'm2',
  'element_type': 'glazing',
  'group_basetype': 'facade',
  'material_impact_data_ID': '5726d14e-d36e-417d-afc4-c70793080186',
  'material_quantity': 1,
  'material_quantity_unit': 'm2/m2',
  'material_type': 'glass'},
 {'building_functional_unit': 'm2',
  'building_quantity': 5.74,
  'building_type': 'Unitised',
  'element_id': '045d27e6-8397-4672-9f4a-6cbc5fe4e716',
  'element_quantity': 6.27,
  'element_quantity_unit': 'm2',
  'element_type': 'cladding',
  'group_basetype': 'facade',
  'material_impact_data_ID': '529d8876-6adb-449c-a12a-74c56aaadc4f',
  'material_quantity': 0.04,
  'material_quantity_unit': 'm/m2',
  'material_type': 'terracotta'},
 {'building_functional_unit': 'm2',
  'building_quantity': 5.74,
  'building_type': 'Unitised',
  'element_id': '045d27e6-8397-4672-9f4a-6cbc5fe4e716',
  'element_quantity': 6.27,
  'element_quantity_unit': 'm2',
  'element_type': 'cladding',
  'group_basetype': 'facade',
  'material_impact_data_ID': 'e28d29a9-38f8-4684-a6b1-0615ac7f66e5',
  'material_quantity': 0.06,
  'material_quantity_unit': 'm/m2',
  'material_type': 'brick'},
 {'building_functional_unit': 'm2',
  'building_quantity': 5.74,
  'building_type': 'Unitised',
  'element_id': '045d27e6-8397-4672-9f4a-6cbc5fe4e716',
  'element_quantity': 6.27,
  'element_quantity_unit': 'm2',
  'element_type': 'cladding',
  'group_basetype': 'facade',
  'material_impact_data_ID': '5043ffe6-9d2e-448e-83ed-f36f1f5decfc',
  'material_quantity': 0.025,
  'material_quantity_unit': 'm/m2',
  'material_type': 'GRC'},
 {'building_functional_unit': 'm2',
  'building_quantity': 5.74,
  'building_type': 'Unitised',
  'element_id': '045d27e6-8397-4672-9f4a-6cbc5fe4e716',
  'element_quantity': 6.27,
  'element_quantity_unit': 'm2',
  'element_type': 'cladding',
  'group_basetype': 'facade',
  'material_impact_data_ID': '53bbd2be-f9ac-4ee7-88f3-34df68ee5187',
  'material_quantity': 0.013,
  'material_quantity_unit': 'm/m2',
  'material_type': 'Fiber cement'}
]

DataFrame

print(material_df)

  group_basetype building_type building_functional_unit  ...               material_impact_data_ID material_quantity_unit material_quantity
0         facade      Unitised                       m2  ...  5726d14e-d36e-417d-afc4-c70793080186                  m2/m2             1.000
1         facade      Unitised                       m2  ...  529d8876-6adb-449c-a12a-74c56aaadc4f                   m/m2             0.040
2         facade      Unitised                       m2  ...  e28d29a9-38f8-4684-a6b1-0615ac7f66e5                   m/m2             0.060
3         facade      Unitised                       m2  ...  5043ffe6-9d2e-448e-83ed-f36f1f5decfc                   m/m2             0.025
4         facade      Unitised                       m2  ...  53bbd2be-f9ac-4ee7-88f3-34df68ee5187                   m/m2             0.013

Thankyou! this works very well. Would explaining to me the logic behind what is happening. Also, how would I be able be able to add in "basetype": "facade", to my data frame? — sam sweeney, Jan 26 '23 at 08:48
I briefly mentioned [comprehensions](https://docs.python.org/3/tutorial/datastructures.html#list-comprehensions) and [dictionary merging](https://stackoverflow.com/a/26853961/1422451) where we reorient the nested object from json. This is a bit dense but flattens for two-dimensions (row by col) for `DataFrame` constructor. See [edit](https://stackoverflow.com/posts/75240279/revisions) adding a level for facade. — Parfait, Jan 26 '23 at 22:13

Flatten Json with Pandas (multiple lists)

1 Answers1