I want to flatten JSON column in a Pandas DataFrame

Question

I have an input dataframe df which is as follows:

id  e
1   {"k1":"v1","k2":"v2"}
2   {"k1":"v3","k2":"v4"}
3   {"k1":"v5","k2":"v6"}

I want to "flatten" the column 'e' so that my resultant dataframe is:

id  e.k1    e.k2
1   v1  v2
2   v3  v4
3   v5  v6

How can I do this? I tried using json_normalize but did not have much success

pault · Answer 1 · 2018-04-14T11:55:03.617

16

Here is a way to use pandas.io.json.json_normalize():

from pandas.io.json import json_normalize
df = df.join(json_normalize(df["e"].tolist()).add_prefix("e.")).drop(["e"], axis=1)
print(df)
#  e.k1 e.k2
#0   v1   v2
#1   v3   v4
#2   v5   v6

However, if you're column is actually a str and not a dict, then you'd first have to map it using json.loads():

import json
df = df.join(json_normalize(df['e'].map(json.loads).tolist()).add_prefix('e.'))\
    .drop(['e'], axis=1)

edited Apr 14 '18 at 11:55

answered Apr 13 '18 at 18:29

pault

41,343
15
107
149

You may want to `df = df.reset_index()` before `df.join` to make sure the rows are matched correctly. – dominik Sep 12 '20 at 01:05
@pault how would you adopt this if some of the rows are blank for the column with json. Your solution works for me with all the rows have valid json but if I have any rows with a blank column I get an error : AttributeError: 'float' object has no attribute 'items' – Snapula Jan 18 '21 at 08:25

user3483203 · Answer 2 · 2018-04-13T19:11:10.557

If your column is not already a dictionary, you could use map(json.loads) and apply pd.Series:

s = df['e'].map(json.loads).apply(pd.Series).add_prefix('e.')

Or if it is already a dictionary, you can apply pd.Series directly:

s = df['e'].apply(pd.Series).add_prefix('e.')

Finally use pd.concat to join back the other columns:

>>> pd.concat([df.drop(['e'], axis=1), s], axis=1).set_index('id')    
id e.k1 e.k2
1    v1   v2
2    v3   v4
3    v5   v6

I want to flatten JSON column in a Pandas DataFrame

2 Answers2

Linked