Restructure data frame based on suffix

Question

df:

 Measurement data name      Power       Speed      Strength
0  100UM_P7.0__S2         20.184625  102.642980   13.233577
1  100UM_P7.0__S3         20.317865  104.380374   12.232056
2  100UM_P7.5__S2         18.370913  103.406857   13.324207
3  100UM_P7.5__S3         22.008733  114.078912   13.262521

In 'Measurement data name', rows 0 and 1, and rows 2 and 3 have the the same value before the suffix (_S). If the values are the same before the suffix, I want to combine those rows and create column names witht that unique suffix added.

desired result:

 Measurement data name      S2Power    S2Speed    S2Strength  S3Power   S3Speed    S3Strength
0  100UM_P7.0              20.184625  102.642980   13.233577  20.317865  104.380374   12.232056
1  100UM_P7.5              18.370913  103.406857   13.324207  22.008733  114.078912   13.262521

I figured I could do something like df.groupby('Measurement data name'.rsplit('_', 1)[0]), but I am not entirely sure.

Any help/tips would be much appreciated. Thanks.

Create a new column only containing the suffix groupby that column. I think that would be the most straightforward way. — cmosig, Apr 09 '20 at 21:06

ansev · Accepted Answer · 2020-04-09T21:40:35.793

Use Series.str.split with expand=True. Then you just have to join it and pivot the frame. I show you a solution with DataFrame.set_index and DataFrame.unstack. Here you can see many other cases of pivoting a data frame : How pivot a DataFrame

new_df = (df.join(df['Measurement data name'].str.split('__', expand=True))
            .set_index([0, 1])
            .drop('Measurement data name',axis=1)
            .unstack(1))
new_df = (new_df.set_axis([f'{y}{x}' for x,y in new_df.columns], axis=1)
                .rename_axis('Measurement data name')
                .reset_index()
                .sort_index(axis=1))
print(new_df)

#Using pivot_table

#new_df = (df.join(df['Measurement data name'].str.split('__', expand=True))
#           .pivot_table(index=0, columns = 1))

Output

  Measurement data name    S2Power     S2Speed  S2Strength    S3Power  \
0            100UM_P7.0  20.184625  102.642980   13.233577  20.317865   
1            100UM_P7.5  18.370913  103.406857   13.324207  22.008733   

      S3Speed  S3Strength  
0  104.380374   12.232056  
1  114.078912   13.262521

Great answer. It seems like there is an error 'NoneType' object has no attribute 'rename_axis' though? — dfahsjdahfsudaf, Apr 09 '20 at 22:04
have you used `inplace` ? `new_df.set_axis([f'{y}{x}' for x,y in new_df.columns], axis=1)` should be return a copy of `DataFrame`, https://stackoverflow.com/questions/43893457/understanding-inplace-true — ansev, Apr 10 '20 at 09:03

Restructure data frame based on suffix

1 Answers1