0

I have the following nested dictionary with lists inside:

[{
    "id": 467,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2620",
            "firstName": "fn_467",
            "lastName": "ln_467"
        },
        "ownerId": 46,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            }
        ]
    }
},
{
    "id": 477,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2522",
            "firstName": "fn_477",
            "lastName": "lm_477"
        },
        "ownerId": 41,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            }
        ]
    }
}]

How can I flatten it and convert it to a dataframe using this library?

from pandas.io.json import json_normalize

I'd like this to be done in one step if possible, instead of flattening to a dataframe twice and then merging them Thanks in advance.

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
nimi1234
  • 178
  • 1
  • 11
  • How "flat" are you wanting it? [`pandas.json_normalize`](https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html) will return a single row with columns `Index(['id', 'status', 'leavePeriod.owner.employeeNumber', 'leavePeriod.owner.firstName', 'leavePeriod.owner.lastName', 'leavePeriod.ownerId', 'leavePeriod.leaves'], dtype='object')`. Please provide us with an example of your desired output. – Ian Thompson Mar 02 '23 at 16:25
  • I'd like it as flat as it can be. when I use json_normalize(data), I get a nested dictionary in the leavePeriod.leaves column. I'de like to flatten it as well – nimi1234 Mar 02 '23 at 16:50
  • @nimi1234 Does [this answer](https://stackoverflow.com/a/39906235/5721636) work for you? – Hamid Rasti Mar 02 '23 at 17:21

2 Answers2

1

Using pandas.json_normalize

Note: from pandas.io.json import json_normalize results in the same output on my machine but raises a FutureWarning.

Data

import pandas as pd  # 1.5.1


data = [{
    "id": 467,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2620",
            "firstName": "fn_467",
            "lastName": "ln_467"
        },
        "ownerId": 46,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            }
        ]
    }
},
{
    "id": 477,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2522",
            "firstName": "fn_477",
            "lastName": "lm_477"
        },
        "ownerId": 41,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            }
        ]
    }
}]

Option 1

# write out your meta data columns and record path
df = pd.json_normalize(
    data=data,
    meta=[
        "id",
        "status",
        ["leavePeriod", "owner", "employeeNumber"],
        ["leavePeriod", "owner", "firstName"],
        ["leavePeriod", "owner", "lastName"],
        ["leavePeriod", "ownerId"]
    ],
    record_path=["leavePeriod", "leaves"]
)

Or if you're lazy and don't wanna split all the meta columns up by hand...

df = pd.json_normalize(
    data=data,
    meta=[
        "id",
        "status",
        "leavePeriod.owner.employeeNumber".split("."),
        "leavePeriod.owner.firstName".split("."),
        "leavePeriod.owner.lastName".split("."),
        "leavePeriod.ownerId".split("."),
    ],
    record_path="leavePeriod.leaves".split(".")
)

df.sample(5)

json_normalize out 1 json_normalize out 2

Parameter Descriptions + Explanations

  • record_path: str or list of str, default None
    Path in each object to list of records. If not passed, data will be assumed to be an array of records.

I set record_path to "leavePeriod.leaves".split(".") because that was the (nested) key(s) in the dictionary that held another layer of records, i.e. array/list of dictionaries. Leaving it blank leads pandas to assume that data is an array of records and has no further nesting.

  • meta: list of paths (str or list of str), default None
    Fields to use as metadata for each record in resulting table.

If you set record_path without setting meta pandas will only normalize and extract the record_path. To keep the upper-level (i.e. not-so-nested) keys, we need to specify them as meta data.

Option 2

df = pd.json_normalize(data)

df

output1

To "flatten" the leavePeriod.leaves even further...

# explode the column and ignore index
df = df.explode(column="leavePeriod.leaves", ignore_index=True)

df.sample(5)  # taking a sample to highlight what happened

exploded output

# convert df["leavePeriod.leaves"] to a list and use `pd.json_normalize` again
leavePeriod = pd.json_normalize(df["leavePeriod.leaves"].to_list())

leavePeriod.sample(5)

leavePeriod output

# join `leavePeriod` with `df` 
# and drop old "leavePeriod.leaves" columns
df = (
    df.join(leavePeriod)
      .drop(columns="leavePeriod.leaves")
)

df.sample(5)

Note: you can't see all the columns in the image because it's too wide so I have two images

joined output 1 joined output 2

Ian Thompson
  • 2,914
  • 2
  • 18
  • 31
0

Just use pd.json_normalize(<put_json_here>).

paradocslover
  • 2,932
  • 3
  • 18
  • 44