Aggregate Dataframe by multiple indices

Question

I have a dataframe that looks like this:

index	period	category
1	20181231	1
2	20181231	2
3	20181231	3
4	20190131	1
5	20190131	2
6	20190131	2

I want to get the following dataframe:

index	period	category	category_count	period_count
1	20181231	1	1	3
2	20181231	2	1	3
3	20181231	3	1	3
4	20190131	1	1	3
4	20190131	2	2	3

I tried to use various group by and aggregate logic but I always end up that period_count equals to category_count since the group by and aggregate will only aggregate through both groups (which are period and category in that case).

Is there a way to do a "nested" group by where one aggregation is done through both groups and the other is done through the first one?

score 0 · Answer 1 · answered Oct 31 '22 at 10:07

One can create both category_count and period_count with pandas.core.groupby.DataFrameGroupBy.transform as follows

df['category_count'] = df.groupby(['period', 'category'])['category'].transform('count')
df['period_count'] = df.groupby(['period'])['category'].transform('count')

[Out]:

         period  category  category_count  period_count
index                                                  
1      20181231         1               1             3
2      20181231         2               1             3
3      20181231         3               1             3
4      20190131         1               1             3
5      20190131         2               2             3
6      20190131         2               2             3

Aggregate Dataframe by multiple indices

1 Answers1