0

I have a dataframe containing unaggregated data, like so:

df[['UniqueID ', 'SrvDesc']]

    UniqueID SrvDesc
0   HEF104  Cash 
1   HEF104  Credit
2   HEF104  Deposit 
3   HEF104  Ticket
4   HEF104  Electronic
5   HEF197  Check
6   HEF197  Credit
7   HEF198  Credit
8   HEF198  Electronic
9   HEF198  Check

10 rows × 2 columns

As you can see, corresponding to each UniqueID, there are an arbitrary number of unique values for SrvDesc (HEF104 has 5 unique SrvDesc values, HEF198 has 3, etc.).

What I'd like to do is perform some operation that will allow me to aggregate on UniqueID so that there is one row per UniqueID, and then any number of populated columns containing each of the values for SrvDesc for that given UniqueID:

    UniqueID SrvDesc_1  SrvDesc_2   SrvDesc_3   SrvDesc_4   SrvDesc_5
0   HEF104   Cash       Credit      Deposit     Ticket      Electronic
1   HEF197   Check      Credit
2   HEF198   Credit     Electronic  Check

I've been looking into pivot and unstack, which seem very useful, but I'm not sure if they would allow me to accomplish exactly what I'm trying to do here.

Thanks!

ansev
  • 30,322
  • 5
  • 17
  • 31
Le Chase
  • 170
  • 9

1 Answers1

1

Use:

df2['count']=df2.groupby('UniqueID').cumcount()+1
df2['count']='SrvDesc_'+df2['count'].astype('str')
new_df=df2.set_index(['UniqueID','count']).unstack('count')['SrvDesc'].reset_index()
print(new_df)

count UniqueID SrvDesc_1   SrvDesc_2 SrvDesc_3 SrvDesc_4   SrvDesc_5
0       HEF104      Cash      Credit   Deposit    Ticket  Electronic
1       HEF197     Check      Credit       NaN       NaN         NaN
2       HEF198    Credit  Electronic     Check       NaN         NaN

if you want remove the name of columns:

new_df.columns.name=None
print(new_df)

      UniqueID SrvDesc_1   SrvDesc_2 SrvDesc_3 SrvDesc_4   SrvDesc_5
0       HEF104      Cash      Credit   Deposit    Ticket  Electronic
1       HEF197     Check      Credit       NaN       NaN         NaN
2       HEF198    Credit  Electronic     Check       NaN         NaN

If you want fillna can specify some value for the fill_value field in the unstack method.

df2['count']=df2.groupby('UniqueID').cumcount()+1
df2['count']='SrvDesc_'+df2['count'].astype('str')
new_df=df2.set_index(['UniqueID','count']).unstack('count',fill_value='')['SrvDesc'].reset_index()
new_df.columns.name=None
print(new_df)

      UniqueID SrvDesc_1   SrvDesc_2 SrvDesc_3 SrvDesc_4   SrvDesc_5
0       HEF104      Cash      Credit   Deposit    Ticket  Electronic
1       HEF197     Check      Credit                                
2       HEF198    Credit  Electronic     Check      
ansev
  • 30,322
  • 5
  • 17
  • 31