Convert list of dicts of dict into DataFrame

Question

I have a list of dictionaries of dictionary looks like:

[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

and the result should looks like:

     a    c    d    e    f
0    1    1    2    3    4
1    2    2    3    4    3
2    3    3    4    5    2
3    4    4    5    6    1

while the default pd.DataFrame(data) looks like:

     a    b                           f
0    1    {'c': 1, 'd': 2, 'e': 3}    4
1    2    {'c': 2, 'd': 3, 'e': 4}    3
2    3    {'c': 3, 'd': 4, 'e': 5}    2
3    4    {'c': 4, 'd': 5, 'e': 6}    1

How can I do this with pandas? Thanks.

Possible duplicate of [Python - How to convert JSON File to Dataframe](https://stackoverflow.com/questions/41168558/python-how-to-convert-json-file-to-dataframe) — Vikash Singh, May 03 '18 at 17:51
convert `{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}` to `{'a': 1, 'c': 1, 'd': 2, 'e': 3, 'f': 4}` before loading into Pandas, you can try it first — Sphinx, May 03 '18 at 17:52
@VikashSingh I would say No, this is not a duplicate, because here we are starting with python dictionaries as apposed to a json file. — Aaron N. Brock, May 03 '18 at 18:18
okay, I did see some difference so I decided to write a different answer. Thanks for pointing it out :) — Vikash Singh, May 03 '18 at 18:20

score 2 · Answer 1 · answered May 03 '18 at 18:06

2

you need to convert json to flat data as such:

import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
        {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
        {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
        {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df

# output:
    a   b.c b.d b.e f
0   1   1   2   3   4
1   2   2   3   4   3
2   3   3   4   5   2
3   4   4   5   6   1

You can rename the columns once it's done..

answered May 03 '18 at 18:06

Vikash Singh

13,213
8
40
70

`json_normalize` is great, but it really needs to be less hidden. – Aaron N. Brock May 03 '18 at 18:12
glad you liked it @AaronN.Brock it's really useful :) Actually lots of features are very hidden and if not for SO we would be very very lost.. – Vikash Singh May 03 '18 at 18:13
Personally, I think this feature needs more spotlight, for instance you should be able to read directly from a `.json` file instead of having to load a json file into a dictionary then pass that into this function. – Aaron N. Brock May 03 '18 at 18:15
Also, there should be options to read it into a `series` etc, etc. It really should be part of the `pd.read_json()` function. The way I got around this was via the [pandas.io.json.normalize.nested_to_record](https://github.com/pandas-dev/pandas/blob/v0.22.0/pandas/io/json/normalize.py#L24-L91) function, which is really berried in there – Aaron N. Brock May 03 '18 at 18:16

score 1 · Accepted Answer · answered May 03 '18 at 18:07

json_normalize is what you're loooking for!

import pandas as pd
from pandas.io.json import json_normalize

x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

sep = '::::' # string that doesn't appear in column names

frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)

Output

   a  c  d  e  f
0  1  1  2  3  4
1  2  2  3  4  3
2  3  3  4  5  2
3  4  4  5  6  1

`json_normalize` comes handy. Thank you! – dexhunter May 03 '18 at 18:23 — dexhunter, May 03 '18 at 18:23

score 0 · Answer 3 · answered May 03 '18 at 18:08

import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets 
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)

Convert list of dicts of dict into DataFrame

3 Answers3