0

I have a dataframe that looks like this:

index period category
1 20181231 1
2 20181231 2
3 20181231 3
4 20190131 1
5 20190131 2
6 20190131 2

I want to get the following dataframe:

index period category category_count period_count
1 20181231 1 1 3
2 20181231 2 1 3
3 20181231 3 1 3
4 20190131 1 1 3
4 20190131 2 2 3

I tried to use various group by and aggregate logic but I always end up that period_count equals to category_count since the group by and aggregate will only aggregate through both groups (which are period and category in that case).

Is there a way to do a "nested" group by where one aggregation is done through both groups and the other is done through the first one?

1 Answers1

0

One can create both category_count and period_count with pandas.core.groupby.DataFrameGroupBy.transform as follows

df['category_count'] = df.groupby(['period', 'category'])['category'].transform('count')
df['period_count'] = df.groupby(['period'])['category'].transform('count')

[Out]:

         period  category  category_count  period_count
index                                                  
1      20181231         1               1             3
2      20181231         2               1             3
3      20181231         3               1             3
4      20190131         1               1             3
5      20190131         2               2             3
6      20190131         2               2             3
Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83