3

I'm trying to create a Pandas dataframe from a python nested dictionary that looks like this:

dictionary = {'user1' : {'a': np.array([1,2,3,4]),
                         'b': np.array([6,7,8,9])},

              'user2' : {'a': np.array([2,3,4,5]),
                         'b': np.array([7,8,9,1])}}

I'd like the data frame to look like this:

      a_w a_x a_y a_z b_w b_x b_y b_z
user1  1   2   3   4   6   7   8   9
user2  2   3   4   5   7   8   9   1

EDIT: (where w,x,y,z are markers that tell what the value in the array represent)

I've tried to modify the solution in these question: Nested dictionary to multiindex dataframe where dictionary keys are column labels

Construct pandas DataFrame from items in nested dictionary

but cannot get the correct form.

Any help would be great, thank you.

sk1995
  • 33
  • 5

2 Answers2

3

You can do the entire thing with a dictionary comprehension, and use enumerate to track the index of each element, giving you some semblance of ordering.

d = {
  k: {f'{ik}_{idx}': el for ik, iv in v.items() for idx, el in enumerate(iv)}
  for k, v in dictionary.items()
}

pd.DataFrame.from_dict(d, orient='index')

       a_0  a_1  a_2  a_3  b_0  b_1  b_2  b_3
user1    1    2    3    4    6    7    8    9
user2    2    3    4    5    7    8    9    1
user3483203
  • 50,081
  • 9
  • 65
  • 94
1

Having duplicated column names is rarely a good idea.. but here you go,

Update 2

result = pd.concat({key:pd.DataFrame(val,index=['w','x','y','z']) for key,val in dictionary.items()})
           .unstack(-1)

You know what, I'm gonna leave the multiindex in the column rather than having _ concatenation. It's often more flexible to leave it this way.

Update 1

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).droplevel(1,axis=1)

Original

result = (pd.concat({key:pd.DataFrame(val) for key,val in dictionary.items()})
            .unstack(-1).T
            .reset_index(level=1,drop=True).T)

result
        a   a   a   a   b   b   b   b
user1   1   2   3   4   6   7   8   9
user2   2   3   4   5   7   8   9   1

Mark Wang
  • 2,623
  • 7
  • 15
  • Nice! Btw. you can avoid the transpose operations (which could be expencsive and can spoil your column types). You can do that by using `result.columns.droplevel(1)` instead of `reset_index`. – jottbe Jul 07 '19 at 01:10
  • Thanks a lot, for the answer. Indeed, you're right about the column names. I made a typo and the column names should be indexed by one of 4 letters: a_w, a_x, a_y, a_z, b_w, b_x, b_y, b_z. I've updated the question. Is it an easy modification of your answer? Thanks again. – sk1995 Jul 07 '19 at 01:19
  • @jottbe haha correct! I totally forgot that! Actually, since 24, you could apply droplevel on dataframe and control axis. See modified answer. – Mark Wang Jul 07 '19 at 08:52