0

I have a pandas data frame with 4 columns and want to get the total durations of each separate activity for each year and run

My current DataFrame:

    activity    duration    year    run
0   Activity A  10          2012    49
1   Activity B  20          2012    49
2   Activity C  10          2012    49
3   Activity D  20          2012    49
4   Activity E  20          2012    49
... ... ... ... ...

41395   Activity B  10  2005    0
41396   Activity C  20  2005    0
41397   Activity D  10  2005    0
41398   Activity E  30  2005    0
41399   Activity A  10  2005    0
41400 rows × 4 columns

I want to create a new DataFrame using the activity values as new columns For each run and year the duration should be summed up as row values

So new DataFrame

Run  Year   Activity A  Activity B  Activity C  Activity D  Activity E
0    2012   100          200          300          200          100
0    2012   200          100          200          100          50
.      
.
1    2012   10          etc etc

I am completely stuck I guess this can be done simply with standard functionality, sorry i dont understand how to format the tables properly Would appreciate your help!

Corralien
  • 109,409
  • 8
  • 28
  • 52
M LUY
  • 23
  • 8

1 Answers1

0

Use pivot_table:

>>> (df.pivot_table(index=['run', 'year'], columns='activity', values='duration')
       .reset_index().rename_axis(columns=None))

   run  year  Activity A  Activity B  Activity C  Activity D  Activity E
0    0  2005          10          10          20          10          30
1   49  2012          10          20          10          20          20
Corralien
  • 109,409
  • 8
  • 28
  • 52