Construct pandas DataFrame from nested dictionaries having list as item

Question

I have several dictionary data and I want to convert to Pandas DataFrame. However, due to unnecessary key '0' (for me), I've obtained undesirable format of DataFrame when I convert these dict to DataFrame. Actually, these dicts are short part of whole data.

dict1 = {1: {0: [-0.022, -0.017]},
         2: {0: [0.269, 0.271]},
         3: {0: [0.118, 0.119]},
         4: {0: [0.057, 0.061]},
         5: {0: [-0.916, -0.924]}}

dict2 = {1: {0: [0.384, 0.398]},
         2: {0: [0.485, 0.489]},
         3: {0: [0.465, 0.469]},
         4: {0: [0.456, 0.468]},
         5: {0: [-0.479, -0.482]}}

dict3 = {1: {0: [-0.323, -0.321]},
         2: {0: [-0.535, -0.534]},
         3: {0: [-0.336, -0.336]},
         4: {0: [-0.140, -0.142]},
         5: {0: [0.175, 0.177]}}

DataFrame(dict1)

                  1               2               3               4  \
0  [-0.022, -0.017]  [0.269, 0.271]  [0.118, 0.119]  [0.057, 0.061]   

                  5  
0  [-0.916, -0.924]

I've solved this problem using 'for' iteration and the result is what I want to obtain finally.

    index = [['dict1', 'dict1', 'dict2', 'dict2', 'dict3', 'dict3'], ['A', 'B']*3]
    dict = DataFrame(index = index)
    for k in dict1.keys():
        dict = dict.join(DataFrame(dict1[k][0]+dict2[k][0]+dict3[k][0], index = index, columns = [k]))

    print dict

             1      2      3      4      5
dict1 A -0.022  0.269  0.118  0.057 -0.916
      B -0.017  0.271  0.119  0.061 -0.924
dict2 A  0.384  0.485  0.465  0.456 -0.479
      B  0.398  0.489  0.469  0.468 -0.482
dict3 A -0.323 -0.535 -0.336 -0.140  0.175
      B -0.321 -0.534 -0.336 -0.142  0.177

However, when I apply this method to whole length of data, I couldn't wait until the operation was done. I've also found method using 'Panel'. It reduced the time but not satisfied yet.

pd.Panel.from_dict(dict1).to_frame()

Please let me know the best way for this simple problem.

score 0 · Answer 1 · answered Aug 03 '15 at 12:02

You can simply modify your input data and convert it to DataFrame:

import itertools

lst = [dict1, dict2, dict3]
dict = {}

for k in dict1:
    temp = [l[k].itervalues().next() for l in lst]
    dict[k] = list(itertools.chain(*temp))

dict['row']  = ['A','B']*len(lst)
dict['dict'] = ['dict'+str(i+1) for i in range(len(lst)) for n in range(2)]

In [23]: pd.DataFrame(dict)
Out[23]:
       1      2      3      4      5   dict row
0 -0.022  0.269  0.118  0.057 -0.916  dict1   A
1 -0.017  0.271  0.119  0.061 -0.924  dict1   B
2  0.384  0.485  0.465  0.456 -0.479  dict2   A
3  0.398  0.489  0.469  0.468 -0.482  dict2   B
4 -0.323 -0.535 -0.336 -0.140  0.175  dict3   A
5 -0.321 -0.534 -0.336 -0.142  0.177  dict3   B

Thanks, Colonel. In my case, your advice needed more time than using pd.Panel.from_dict(dict1).to_frame() — Sunkist, Aug 04 '15 at 01:14

score 0 · Accepted Answer · answered Aug 03 '15 at 12:34

0

You should simply drop a level from your nested dict to make life easier. The code below drops the unnecessary part of your dicts and concatenates the dataframes from each of the dicts together.

all_dicts=[dict1,dict2,dict3]
df=pd.concat([pd.DataFrame({k:v[0] for k,v in d.items()}) for d in all_dicts])
df.index=pd.MultiIndex.from_product([['dict1','dict2','dict3'],['A','B']])

>>> df 
             1      2      3      4      5
dict1 A -0.022  0.269  0.118  0.057 -0.916
      B -0.017  0.271  0.119  0.061 -0.924
dict2 A  0.384  0.485  0.465  0.456 -0.479
      B  0.398  0.489  0.469  0.468 -0.482
dict3 A -0.323 -0.535 -0.336 -0.140  0.175
      B -0.321 -0.534 -0.336 -0.142  0.177

answered Aug 03 '15 at 12:34

khammel

2,047
1
14
18

Thanks, khammel. This code reduce the time greatly. Then, I've found your code reduce the time than native reading method df = pd.DataFrame(dict1). Do you have any idea about this? – Sunkist Aug 04 '15 at 01:45
Not sure what you mean by your comment @Sunkist. I did use pd.DataFrame() on each of your dict's, however, what I did was apply a dict comprehension using each of them to strip out the nested zeros. pd.concat() just took all three of the DataFrame's from dict's and concatenates them all together. – khammel Aug 04 '15 at 02:11
I'm sorry for making you confused, khammel. My question is basic one. In your comment @khammel, I thought '{k:v[0] for k,v in d.items()}' needs additional works(time) to construct dictionary again. However, pd.DataFrame({k:v[0] for k,v in d.items()}) shows shorter time than pd.DataFrame(d) even though both {k:v[0] for k,v in d.items()} and d are dictionaries. This is only my curiosity. – Sunkist Aug 04 '15 at 10:15
You'd have to work your way through all the methods called when you initialize a DataFrame in the pandas code. I suspect that by removing nested data it is only seen as a grouping of columns to convert to DataFrame (just key value pairs to parse where the value is a series like list). Whereas in the original case it needs to loop through the dict of dict's at different levels determining what data is in the columns, index and data. – khammel Aug 04 '15 at 11:30
I've understood nested dictionary making additional loop. I couldn't think about that. Thank you. – Sunkist Aug 04 '15 at 23:58

Construct pandas DataFrame from nested dictionaries having list as item

2 Answers2

Linked