0

I have a data frame with dates, categories, and time durations. I want to sum the time durations if the entries have the same date and the same category.

Input:

Date        Duration   Category
01/01/2021  0.1        Entertainment
01/01/2021  1.4        Working
01/01/2021  2.1        Entertainment
02/01/2021  7.9        Sleeping
02/01/2021  1.2        Working
02/01/2021  2.8        Working
04/01/2021  6.2        Sleeping

Output:

Date        Entertainment   Working   Sleeping
01/01/2021  2.2             1.4       0
02/01/2021  0               4.0       7.9
03/01/2021  0               0         0
04/01/2021  0               0         6.2

I have more categories so if you can allow it to easily add new categories. The code I have doesn't work at all so please help me out thanks.

1 Answers1

1

you can use pivot_table():

out=(df.pivot_table('Duration','Date','Category',fill_value=0,aggfunc='sum')
       .rename_axis(columns=None)
       .reset_index())

OR

you can use pd.crosstab():

out=(pd.crosstab(df['Date'],df['Category'],df['Duration'],aggfunc='sum')
       .fillna(0)
       .rename_axis(columns=None)
       .reset_index())

output of out:

      Date      Entertainment   Sleeping    Working
0   01/01/2021  2.2             0.0         1.4
1   02/01/2021  0.0             7.9         4.0
2   04/01/2021  0.0             6.2         0.0
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41