1

Hey I am struggling with a transformation of a DataFrame:

The initial frame has a format like this:

df=pd.DataFrame({'A':['A1','A1','A1','A1','A1','A2','A2','A2','A2','A3','A3','A3','A3'],
         'B':['B1','B1','B1','B1','B2','B2','B2','B3','B3','B3','B4','B4','B4'],
         'C':['C1','C1','C1','C2','C2','C3','C3','C4','C4','C5','C5','C6','C6'],
         'X':['a','b','c','a','c','a','b','b','c','a','c','a','c'],
         'Y':[1,4,4,2,4,1,4,3,1,2,3,4,5]})


    A   B   C   X   Y
    A1  B1  C1  a   1
    A1  B1  C1  b   4
    A1  B1  C1  c   4
    A1  B1  C2  a   2
    A1  B2  C2  c   4
    A2  B2  C3  a   1
    A2  B2  C3  b   4
    A2  B3  C4  b   3
    A2  B3  C4  c   1
    A3  B3  C5  a   2
    A3  B4  C5  c   3
    A3  B4  C6  a   4
    A3  B4  C6  c   5

I have some columns in the beginning where I want to apply groupby and then transpose the last two columns:

First df.groupby(['A','B','C','X']).sum()

                Y
A   B   C   X   
A1  B1  C1  a   1
            b   4
            c   4
        C2  a   2
    B2  C2  c   4
A2  B2  C3  a   1
            b   4
    B3  C4  b   3
            c   1
A3  B3  C5  a   2
    B4  C5  c   3
        C6  a   4
            c   5

and then transpose the X/Y columns and add them horizontally.

A   B   C   a   b   c
A1  B1  C1  1.0 4.0 4.0
A1  B1  C2  2.0 NaN NaN
A1  B2  C2  NaN NaN 4.0
A2  B2  C3  1.0 4.0 NaN
A2  B3  C4  NaN 3.0 1.0
A3  B3  C5  2.0 NaN NaN
A3  B4  C5  NaN NaN 3.0
A3  B4  C6  4.0 NaN 5.0

Not all groupby rows have all values so they need to be filled with something like np.nan. The question is linked to this one here but it is more complicated and I couldn't figure it out.

Erdbeer0815
  • 155
  • 8

1 Answers1

2

Use Series.unstack for reshape:

df1 = (df.groupby(['A','B','C','X'])['Y'].sum()
         .unstack()
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
    A   B   C    a    b    c
0  A1  B1  C1  1.0  4.0  4.0
1  A1  B1  C2  2.0  NaN  NaN
2  A1  B2  C2  NaN  NaN  4.0
3  A2  B2  C3  1.0  4.0  NaN
4  A2  B3  C4  NaN  3.0  1.0
5  A3  B3  C5  2.0  NaN  NaN
6  A3  B4  C5  NaN  NaN  3.0
7  A3  B4  C6  4.0  NaN  5.0

Alternative with DataFrame.pivot_table:

df1 = (df.pivot_table(index=['A','B','C'],
                      columns='X',
                      values='Y',
                      aggfunc='sum').reset_index().rename_axis(None, axis=1))
print (df1)
    A   B   C    a    b    c
0  A1  B1  C1  1.0  4.0  4.0
1  A1  B1  C2  2.0  NaN  NaN
2  A1  B2  C2  NaN  NaN  4.0
3  A2  B2  C3  1.0  4.0  NaN
4  A2  B3  C4  NaN  3.0  1.0
5  A3  B3  C5  2.0  NaN  NaN
6  A3  B4  C5  NaN  NaN  3.0
7  A3  B4  C6  4.0  NaN  5.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252