0

For my dataset:

df = pd.DataFrame({'ID': [1, 1, 2, 2, 2, 2, 3],
                   'Month': ['Feb', 'March', 'Feb', 'Feb', 'March', 'March', 'May']})

I want to aggregate a data set by ID and its month so I get the counts per month and the total sum over all periods:

ID Month
1   Feb
1   March
2   Feb
2   Feb
2   March
2   March
3   May

I need following output:

ID Month  count total_sum
1  Feb    1     2
1  Marc   1     2
2  Feb    2     4
2  March  2     4
3  May    1     1
Kosta S.
  • 339
  • 1
  • 2
  • 11
  • It looks like you want a groupby (in your case you would groupby ["ID","Month"] with a couple functions. See https://stackoverflow.com/questions/12589481/multiple-aggregations-of-the-same-column-using-pandas-groupby-agg – Leo Jan 13 '22 at 17:41

1 Answers1

1

You can groupby twice, once by ['ID', 'Month'] and then by 'ID' to count per month per ID and total count per ID, respectively.

Note that in the first groupby is the case where you use count method but in the second groupby you use sum method because you're aggregating the counts.

out = df.groupby(['ID', 'Month']).agg(count=('ID','count')).reset_index()
out['total count'] = out.groupby('ID')['count'].transform('sum')

Output:

   ID  Month  count  total count
0   1    Feb      1            2
1   1  March      1            2
2   2    Feb      2            4
3   2  March      2            4
4   3    May      1            1