2

I have a list of dictionaries of dictionary looks like:

[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

and the result should looks like:

     a    c    d    e    f
0    1    1    2    3    4
1    2    2    3    4    3
2    3    3    4    5    2
3    4    4    5    6    1

while the default pd.DataFrame(data) looks like:

     a    b                           f
0    1    {'c': 1, 'd': 2, 'e': 3}    4
1    2    {'c': 2, 'd': 3, 'e': 4}    3
2    3    {'c': 3, 'd': 4, 'e': 5}    2
3    4    {'c': 4, 'd': 5, 'e': 6}    1

How can I do this with pandas? Thanks.

dexhunter
  • 578
  • 8
  • 24
  • Possible duplicate of [Python - How to convert JSON File to Dataframe](https://stackoverflow.com/questions/41168558/python-how-to-convert-json-file-to-dataframe) – Vikash Singh May 03 '18 at 17:51
  • convert `{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}` to `{'a': 1, 'c': 1, 'd': 2, 'e': 3, 'f': 4}` before loading into Pandas, you can try it first – Sphinx May 03 '18 at 17:52
  • @VikashSingh I would say No, this is not a duplicate, because here we are starting with python dictionaries as apposed to a json file. – Aaron N. Brock May 03 '18 at 18:18
  • okay, I did see some difference so I decided to write a different answer. Thanks for pointing it out :) – Vikash Singh May 03 '18 at 18:20

3 Answers3

2

you need to convert json to flat data as such:

import pandas as pd
from pandas.io.json import json_normalize
data = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
        {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
        {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
        {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

df = pd.DataFrame.from_dict(json_normalize(data), orient='columns')
df

# output:
    a   b.c b.d b.e f
0   1   1   2   3   4
1   2   2   3   4   3
2   3   3   4   5   2
3   4   4   5   6   1

You can rename the columns once it's done..

Vikash Singh
  • 13,213
  • 8
  • 40
  • 70
  • `json_normalize` is great, but it really needs to be less hidden. – Aaron N. Brock May 03 '18 at 18:12
  • glad you liked it @AaronN.Brock it's really useful :) Actually lots of features are very hidden and if not for SO we would be very very lost.. – Vikash Singh May 03 '18 at 18:13
  • Personally, I think this feature needs more spotlight, for instance you should be able to read directly from a `.json` file instead of having to load a json file into a dictionary then pass that into this function. – Aaron N. Brock May 03 '18 at 18:15
  • Also, there should be options to read it into a `series` etc, etc. It really should be part of the `pd.read_json()` function. The way I got around this was via the [pandas.io.json.normalize.nested_to_record](https://github.com/pandas-dev/pandas/blob/v0.22.0/pandas/io/json/normalize.py#L24-L91) function, which is really berried in there – Aaron N. Brock May 03 '18 at 18:16
1

json_normalize is what you're loooking for!

import pandas as pd
from pandas.io.json import json_normalize

x = [{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]

sep = '::::' # string that doesn't appear in column names

frame = json_normalize(x, sep=sep)
frame.columns = frame.columns.str.split(sep).str[-1]
print(frame)

Output

   a  c  d  e  f
0  1  1  2  3  4
1  2  2  3  4  3
2  3  3  4  5  2
3  4  4  5  6  1
Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
0
import pandas as pd
z=[{'a': 1, 'b': {'c': 1, 'd': 2, 'e': 3}, 'f': 4}, 
 {'a': 2, 'b': {'c': 2, 'd': 3, 'e': 4}, 'f': 3}, 
 {'a': 3, 'b': {'c': 3, 'd': 4, 'e': 5}, 'f': 2}, 
 {'a': 4, 'b': {'c': 4, 'd': 5, 'e': 6}, 'f': 1 }]
step1=pd.DataFrame(z)
column_with_sets = 'b'
step2=pd.DataFrame(list(step1[column_with_sets]))
step3=pd.concat([step1[[i for i in step1.columns if column_with_sets 
not in i]], step2],1)
step4=output.reindex_axis(sorted(output.columns), axis=1)