You can use json_normalize()
which is pretty effective.
This article explains it very well for different scenarios:
https://towardsdatascience.com/all-pandas-json-normalize-you-should-know-for-flattening-json-13eae1dfb7dd
Here some examples.
some_dict = {
'key1': 'value1',
'key2': 'value2',
'key3': 123,
}
df = pd.json_normalize(some_dict)
WIth multiple levels, if you don't want all of the levels, you can use:
pd.json_normalize(data, max_level=1)
With nested lists you can use meta
to specify a list of to include:
json_object = {
'key1': 'value1',
'key2': 'value2',
'key3': {
'key3_1': 'value3',
'key3_2': {
'key3_2_1': {
'admission': 'value4',
'general': 'value5'
},
'key3_3': 'value6',
}
},
'key4': [
{ 'key4_1': 'value7' },
{ 'key4_2': 'value8' },
{ 'key4_3': 'value9' }
],
}
# you can do:
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
)
If you have lists where not all keys are always present, you can use errors='ignore'
.
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
errors='ignore'
)
By default, nested values will be separated with .
, you can change this with sep=''
:
pd.json_normalize(
json_object,
record_path =['key4'],
meta=['key1', ['key3', 'key3_2', 'key3_3']],
errors='ignore',
sep='-'
)
It depends on where you get your JSON data, from local file or URL.
For local file:
import json
# load data using Python JSON module
with open('data/simple.json','r') as f:
data = json.loads(f.read())
# Flattening JSON data
pd.json_normalize(data)
For URLs:
import requestsURL = 'http://raw.githubusercontent.com/BindiChen/machine-learning/master/data-analysis/027-pandas-convert-json/data/simple.json'
data = json.loads(requests.get(URL).text)# Flattening JSON data
pd.json_normalize(data)