8

I have an input dataframe df which is as follows:

id  e
1   {"k1":"v1","k2":"v2"}
2   {"k1":"v3","k2":"v4"}
3   {"k1":"v5","k2":"v6"}

I want to "flatten" the column 'e' so that my resultant dataframe is:

id  e.k1    e.k2
1   v1  v2
2   v3  v4
3   v5  v6

How can I do this? I tried using json_normalize but did not have much success

Symphony
  • 1,655
  • 4
  • 15
  • 22

2 Answers2

16

Here is a way to use pandas.io.json.json_normalize():

from pandas.io.json import json_normalize
df = df.join(json_normalize(df["e"].tolist()).add_prefix("e.")).drop(["e"], axis=1)
print(df)
#  e.k1 e.k2
#0   v1   v2
#1   v3   v4
#2   v5   v6

However, if you're column is actually a str and not a dict, then you'd first have to map it using json.loads():

import json
df = df.join(json_normalize(df['e'].map(json.loads).tolist()).add_prefix('e.'))\
    .drop(['e'], axis=1)
pault
  • 41,343
  • 15
  • 107
  • 149
  • You may want to `df = df.reset_index()` before `df.join` to make sure the rows are matched correctly. – dominik Sep 12 '20 at 01:05
  • @pault how would you adopt this if some of the rows are blank for the column with json. Your solution works for me with all the rows have valid json but if I have any rows with a blank column I get an error : AttributeError: 'float' object has no attribute 'items' – Snapula Jan 18 '21 at 08:25
5

If your column is not already a dictionary, you could use map(json.loads) and apply pd.Series:

s = df['e'].map(json.loads).apply(pd.Series).add_prefix('e.')

Or if it is already a dictionary, you can apply pd.Series directly:

s = df['e'].apply(pd.Series).add_prefix('e.')

Finally use pd.concat to join back the other columns:

>>> pd.concat([df.drop(['e'], axis=1), s], axis=1).set_index('id')    
id e.k1 e.k2
1    v1   v2
2    v3   v4
3    v5   v6
user3483203
  • 50,081
  • 9
  • 65
  • 94