Pandas Dataframe from Python nested dictionary

Question

I'm trying to create a Pandas dataframe from a python nested dictionary that looks like this:

dictionary = {'user1' : {'a': np.array([1,2,3,4]),
                         'b': np.array([6,7,8,9])},

              'user2' : {'a': np.array([2,3,4,5]),
                         'b': np.array([7,8,9,1])}}

I'd like the data frame to look like this:

      a_w a_x a_y a_z b_w b_x b_y b_z
user1  1   2   3   4   6   7   8   9
user2  2   3   4   5   7   8   9   1

EDIT: (where w,x,y,z are markers that tell what the value in the array represent)

I've tried to modify the solution in these question: Nested dictionary to multiindex dataframe where dictionary keys are column labels

Construct pandas DataFrame from items in nested dictionary

but cannot get the correct form.

Any help would be great, thank you.

not sure why you would like to have dataframe with duplicated headers... — Mark Wang, Jul 07 '19 at 00:16
Is there any specific reason to use numpy arrays? Is it allowed to use plain lists instead to answer your question? — amanb, Jul 07 '19 at 09:37

score 3 · Accepted Answer · answered Jul 07 '19 at 01:57

You can do the entire thing with a dictionary comprehension, and use enumerate to track the index of each element, giving you some semblance of ordering.

d = {
  k: {f'{ik}_{idx}': el for ik, iv in v.items() for idx, el in enumerate(iv)}
  for k, v in dictionary.items()
}

pd.DataFrame.from_dict(d, orient='index')

       a_0  a_1  a_2  a_3  b_0  b_1  b_2  b_3
user1    1    2    3    4    6    7    8    9
user2    2    3    4    5    7    8    9    1

Mark Wang · Answer 2 · 2019-07-07T08:57:55.153

1

Having duplicated column names is rarely a good idea.. but here you go,

Update 2

result = pd.concat({key:pd.DataFrame(val,index=['w','x','y','z']) for key,val in dictionary.items()})
           .unstack(-1)

You know what, I'm gonna leave the multiindex in the column rather than having _ concatenation. It's often more flexible to leave it this way.

Update 1

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).droplevel(1,axis=1)

Original

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).T
            .reset_index(level=1,drop=True).T)

result
        a   a   a   a   b   b   b   b
user1   1   2   3   4   6   7   8   9
user2   2   3   4   5   7   8   9   1

edited Jul 07 '19 at 08:57

answered Jul 07 '19 at 00:21

Mark Wang

2,623
7
15

Nice! Btw. you can avoid the transpose operations (which could be expencsive and can spoil your column types). You can do that by using `result.columns.droplevel(1)` instead of `reset_index`. – jottbe Jul 07 '19 at 01:10
Thanks a lot, for the answer. Indeed, you're right about the column names. I made a typo and the column names should be indexed by one of 4 letters: a_w, a_x, a_y, a_z, b_w, b_x, b_y, b_z. I've updated the question. Is it an easy modification of your answer? Thanks again. – sk1995 Jul 07 '19 at 01:19
@jottbe haha correct! I totally forgot that! Actually, since 24, you could apply droplevel on dataframe and control axis. See modified answer. – Mark Wang Jul 07 '19 at 08:52

Pandas Dataframe from Python nested dictionary

2 Answers2