I have a pandas data frame. I want to group it by using one combination of columns and count distinct values of another combination of columns.
For example I have the following data frame:
a b c d e
0 1 10 100 1000 10000
1 1 10 100 1000 20000
2 1 20 100 1000 20000
3 1 20 100 2000 20000
I can group it by columns a
and b
and count distinct values in the column d
:
df.groupby(['a','b'])['d'].nunique().reset_index()
As a result I get:
a b d
0 1 10 1
1 1 20 2
However, I would like to count distinct values in a combination of columns. For example if I use c
and d
, then in the first group I have only one unique combination ((100, 1000)
) while in the second group I have two distinct combinations: (100, 1000)
and (100, 2000)
.
The following naive "generalization" does not work:
df.groupby(['a','b'])[['c','d']].nunique().reset_index()
because nunique()
is not applicable to data frames.