Nested dictionary to multiindex dataframe where dictionary keys are column labels

Question

Say I have a dictionary that looks like this:

dictionary = {'A' : {'a': [1,2,3,4,5],
                     'b': [6,7,8,9,1]},

              'B' : {'a': [2,3,4,5,6],
                     'b': [7,8,9,1,2]}}

and I want a dataframe that looks something like this:

Is there a convenient way to do this? If I try:

In [99]:

DataFrame(dictionary)

Out[99]:
     A               B
a   [1, 2, 3, 4, 5] [2, 3, 4, 5, 6]
b   [6, 7, 8, 9, 1] [7, 8, 9, 1, 2]

I get a dataframe where each element is a list. What I need is a multiindex where each level corresponds to the keys in the nested dict and the rows corresponding to each element in the list as shown above. I think I can work a very crude solution but I'm hoping there might be something a bit simpler.

BrenBarn · Accepted Answer · 2022-11-16T22:11:51.323

104

Pandas wants the MultiIndex values as tuples, not nested dicts. The simplest thing is to convert your dictionary to the right format before trying to pass it to DataFrame:

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
{('A', 'a'): [1, 2, 3, 4, 5],
 ('A', 'b'): [6, 7, 8, 9, 1],
 ('B', 'a'): [2, 3, 4, 5, 6],
 ('B', 'b'): [7, 8, 9, 1, 2]}
>>> pandas.DataFrame(reform)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  2

[5 rows x 4 columns]

edited Nov 16 '22 at 22:11

answered Jul 28 '14 at 03:58

BrenBarn

242,874
37
412
384

A little tricky to wrap the head around but this is exactly what I was hoping for. – pbreach Jul 28 '14 at 04:12
2

+1, but this does not work if the length of the dictionary value lists, e.g. 5 in this specific example, do not match. Any ideas how I can tackle this? For instance if `reform= ('A', 'a'): [1, 2, 3, 4, 5], ('A', 'b'): [6, 7, 8, 9,]` – Zhubarb Oct 22 '15 at 14:16
3

@Zhubarb: What do you expect the resulting DataFrame to look like? A DataFrame has to be rectangular; it can't have columns of different lengths. – BrenBarn Oct 22 '15 at 16:45
You are right, I initiall thought of taking the max list length as the row number and padding the lists that fell short, but it is just a crude work-around. Does not make much sense – Zhubarb Oct 22 '15 at 17:09
1

The columns are randomly ordered in the dataframe, since the dict is unordered. How can the desired order be enforced, short of using an OrderedDict? – Dave Kielpinski Nov 12 '16 at 01:23
@DaveKielpinski: You can't enforce the order in the dict. Just get the columns in the order you want after creating the DataFrame, e.g., `DataFrame(...)[["A", "B"]]`. – BrenBarn Nov 12 '16 at 02:03
@BrenBarn This gets very nasty when there are multiple labels at each level. – Dave Kielpinski Nov 12 '16 at 02:13
@DaveKielpinski You can't enforce the order in the dict. If you want to enforce the order, don't use a dict. Put the data into a list-based format instead, for instance. If you just want to *sort* the columns you can ise `.sort(axis=1)`, but if you have a specific order you need you'll have to do something else. In any case, if you have a question about this, you should ask it as a separate question. – BrenBarn Nov 12 '16 at 02:14
@BrenBarn The original question has dataframe column labels in the same order as the presented data, so I think this is relevant. However, the OP accepted your answer so you obviously addressed the OP's intent. – Dave Kielpinski Nov 14 '16 at 19:34
3

This is great. FYI, this can also be done with `pd.DataFrame.from_dict` if `values` is is in the form of "records": `[{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, ...]` – EliadL Feb 03 '19 at 09:18

Vira · Answer 2 · 2023-02-06T15:24:32.283

35

You're looking for the functionality in .stack:

df = pandas.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
# to break out the lists into columns
df = pandas.DataFrame(df[0].values.tolist(), index=df.index)

edited Feb 06 '23 at 15:24

answered Apr 07 '20 at 20:49

Vira

509
5
8

1

Thanks! This answer does not require reformatting nested dicts as `{(key0, key1): [data0, data_n...]}` nor does it fail on pandas v1.x. – ralex Jul 22 '20 at 05:24
1

This is great. Works on data that is missing keys, too. – Connor Ferster Oct 19 '20 at 22:56

score 26 · Answer 3 · answered Jun 28 '17 at 18:35

26

dict_of_df = {k: pd.DataFrame(v) for k,v in dictionary.items()}
df = pd.concat(dict_of_df, axis=1)

Note that the order of columns is lost for python < 3.6

answered Jun 28 '17 at 18:35

user8227892

261
3
3

I am having another problem using this method. from yahoofinancials import YahooFinancials tickers = ['AAPL', 'WFC', 'F', 'FB', 'DELL', 'SNE'] yahoo_financials = YahooFinancials(tickers) BB=yahoo_financials.get_key_statistics_data() dict_of_df = {k: pd.DataFrame(v) for k,v in BB.items()} df = pd.concat(dict_of_df, axis=1) ValueError: If using all scalar values, you must pass an index – rsc05 Nov 09 '19 at 12:12

score 2 · Answer 4 · answered Aug 21 '20 at 09:09

This recursive function should work:

def reform_dict(dictionary, t=tuple(), reform={}):
    for key, val in dictionary.items():
        t = t + (key,)
        if isinstance(val, dict):
            reform_dict(val, t, reform)
        else:
            reform.update({t: val})
        t = t[:-1]
    return reform

score 2 · Answer 5 · answered Feb 26 '21 at 09:12

If lists in the dictionary are not of the same lenght, you can adapte the method of BrenBarn.

>>> dictionary = {'A' : {'a': [1,2,3,4,5],
                         'b': [6,7,8,9,1]},
                 'B' : {'a': [2,3,4,5,6],
                        'b': [7,8,9,1]}}

>>> reform = {(outerKey, innerKey): values for outerKey, innerDict in dictionary.items() for innerKey, values in innerDict.items()}
>>> reform
 {('A', 'a'): [1, 2, 3, 4, 5],
  ('A', 'b'): [6, 7, 8, 9, 1],
  ('B', 'a'): [2, 3, 4, 5, 6],
  ('B', 'b'): [7, 8, 9, 1]}

>>> pandas.DataFrame.from_dict(reform, orient='index').transpose()
>>> df.columns = pd.MultiIndex.from_tuples(df.columns)
   A     B   
   a  b  a  b
0  1  6  2  7
1  2  7  3  8
2  3  8  4  9
3  4  9  5  1
4  5  1  6  NaN
[5 rows x 4 columns]

score 0 · Answer 6 · answered Jun 14 '22 at 16:51

This solution works for a larger dataframe, it fits what was requested

cols = df.columns
int_cols = len(cols)
col_subset_1 = [cols[x] for x in range(1,int(int_cols/2)+1)]
col_subset_2 = [cols[x] for x in range(int(int_cols/2)+1, int_cols)]

col_subset_1_label = list(zip(['A']*len(col_subset_1), col_subset_1))
col_subset_2_label = list(zip(['B']*len(col_subset_2), col_subset_2))
df.columns = pd.MultiIndex.from_tuples([('','myIndex'),*col_subset_1_label,*col_subset_2_label])

OUTPUT

                        A                      B
     myIndex    a              b          c          d
0   0.159710    1.472925    0.619508    -0.476738   0.866238
1   -0.665062   0.609273    -0.089719   0.730012    0.751615
2   0.215350    -0.403239   1.801829    -2.052797   -1.026114
3   -0.609692   1.163072    -1.007984   -0.324902   -1.624007
4   0.791321    -0.060026   -1.328531   -0.498092   0.559837
5   0.247412    -0.841714   0.354314    0.506985    0.425254
6   0.443535    1.037502    -0.433115   0.601754    -1.405284
7   -0.433744   1.514892    1.963495    -2.353169   1.285580

Nested dictionary to multiindex dataframe where dictionary keys are column labels

6 Answers6

Linked