I have a complicated nested json file.i need a generic code which flattens this nested file and stores the result in dataframe using either pyspark or pandas. Is it achievable and is their any generic code which works for any complicated nested json files?
Asked
Active
Viewed 146 times
1 Answers
0
I have added json in data variable. To import json file you can use
df = pd.read_json('data.json')
I have used json_normalize()
to flatten nested json data.
Deeply nested JSON structure that can be converted dataframe by passing the meta arguments to the json_normalize function as shown below.
import pandas as pd
data = [
{
"company": "Google",
"tagline": "Hello World",
"management": {"CEO": "ABC"},
"department": [
{"name": "Gmail", "revenue (bn)": 123},
{"name": "GCP", "revenue (bn)": 400},
{"name": "Google drive", "revenue (bn)": 600},
],
},
{
"company": "Microsoft",
"tagline": "This is text",
"management": {"CEO": "XYZ"},
"department": [
{"name": "Onedrive", "revenue (bn)": 13},
{"name": "Azure", "revenue (bn)": 300},
{"name": "Microsoft 365", "revenue (bn)": 300},
],
},
]
df = pd.json_normalize(
data, "department", ["company", "tagline", ["management", "CEO"]]
)
df
Output

Abhishek K
- 3,047
- 1
- 6
- 19