How to create a multi index dataframe from a nested dictionary?

Question

I have a nested dictionary, whose first level keys are [0, 1, 2...] and the corresponding values of each key are of the form:

{
    "geometry": {
        "type": "Point",
        "coordinates": [75.4516454, 27.2520587]
    },
    "type": "Feature",
    "properties": {
        "state": "Rajasthan",
        "code": "BDHL",
        "name": "Badhal",
        "zone": "NWR",
        "address": "Kishangarh Renwal, Rajasthan"
    }
}

I want to make a pandas dataframe of the form:

        Geometry           Type                    Properties
   Type      Coordinates           State     Code    Name    Zone    Address
0  Point     [..., ...]   Features Rajasthan BDHL    ...     ...     ...
1
2

I am not able to understand the examples over the net about multi indexing/nested dataframe/pivoting. None of them seem to take the first level keys as the primary index in the required dataframe.

How do I get from the data I have, to making it into this formatted dataframe?

See if you require any further clarification from the answers below. If no more question, let us know any of the answers best fit for your needs by accepting the answer you choose and upvote any answer you find helpful. Thanks! — SeaBean, Oct 05 '21 at 05:41

Pascal G. Bernard · Answer 1 · 2021-10-03T19:20:00.533

I would suggest to create columns as "geometry_type", "geometry_coord", etc.. in order to differentiate theses columns from the column that you would name "type". In other words, using your first key as a prefix, and the subkey as the name, and hence creating a new name. And after, just parse and fill your Dataframe like that:

import json
j = json.loads("your_json.json")

df = pd.DataFrame(columns=["geometry_type", "geometry_coord", ... ])

for k, v in j.items():
    if k == "geometry":
        df = df.append({
            "geometry_type": v.get("type"),
            "geometry_coord": v.get("coordinates")
        }, ignore_index=True)
    ...

Your output could then looks like this :

    geometry_type               geometry_coord    ...
0   [75.4516454, 27.2520587]    NaN               ...

PS : And if you really want to go for your initial option, you could check here : Giving a column multiple indexes/headers

score 0 · Answer 2 · answered Oct 03 '21 at 19:48

I suppose you have a list of nested dictionaries.

Use json_normalize to read json data and split current column index into 2 part using str.partition:

import pandas as pd
import json

data = json.load(open('data.json'))
df = pd.json_normalize(data)
df.columns = df.columns.str.partition('.', expand=True).droplevel(level=1)

Output:

>>> df.columns
MultiIndex([(      'type',            ''),
            (  'geometry',        'type'),
            (  'geometry', 'coordinates'),
            ('properties',       'state'),
            ('properties',        'code'),
            ('properties',        'name'),
            ('properties',        'zone'),
            ('properties',     'address')],
           )

>>> df
      type geometry                           properties                     
               type               coordinates      state  code    name zone                        address   
0  Feature    Point  [75.4516454, 27.2520587]  Rajasthan  BDHL  Badhal  NWR   Kishangarh Renwal, Rajasthan

SeaBean · Answer 3 · 2021-10-03T19:56:00.543

You can use pd.json_normalize() to normalize the nested dictionary into a dataframe df.

Then, split the column names with dots into multi-index with Index.str.split on df.columns with parameter expand=True, as follows:

Step 1: Normalize nested dict into a dataframe

j = {
    "geometry": {
        "type": "Point",
        "coordinates": [75.4516454, 27.2520587]
    },
    "type": "Feature",
    "properties": {
        "state": "Rajasthan",
        "code": "BDHL",
        "name": "Badhal",
        "zone": "NWR",
        "address": "Kishangarh Renwal, Rajasthan"
    }
} 

df = pd.json_normalize(j)

Step 1 Result:

print(df)

      type geometry.type      geometry.coordinates properties.state properties.code properties.name properties.zone            properties.address
0  Feature         Point  [75.4516454, 27.2520587]        Rajasthan            BDHL          Badhal             NWR  Kishangarh Renwal, Rajasthan

Step 2: Create Multi-index column labels

df.columns = df.columns.str.split('.', expand=True)

Step 2 (Final) Result:

print(df)

      type geometry                           properties                                                 
       NaN     type               coordinates      state  code    name zone                       address
0  Feature    Point  [75.4516454, 27.2520587]  Rajasthan  BDHL  Badhal  NWR  Kishangarh Renwal, Rajasthan

How to create a multi index dataframe from a nested dictionary?

3 Answers3