How to count distinct values in a column of a pandas group by object?

Question

I have a pandas data frame and group it by two columns (for example col1 and col2). For fixed values of col1 and col2 (i.e. for a group) I can have several different values in the col3. I would like to count the number of distinct values from the third columns.

For example, If I have this as my input:

I would like to have this table (data frame) as the output:

related topic is [.value_counts()](https://stackoverflow.com/questions/20076195/what-is-the-most-efficient-way-of-counting-occurrences-in-pandas#20076611) — cardamom, Jun 19 '17 at 15:33

score 27 · Accepted Answer · answered Jul 29 '13 at 14:16

27

df.groupby(['col1','col2'])['col3'].nunique().reset_index()

answered Jul 29 '13 at 14:16

Roman

124,451
167
349
456

1

interestingly `nunique` seems twice as slow as Jeff's answer. – Andy Hayden Jul 29 '13 at 14:31
Weird! I am seeing that also. Groupby may be taking the wrong so-called path here -- the logic that applies functions to groups is pretty dense. – Dan Allan Jul 29 '13 at 15:52
2

there is more overhead with calling ``value_count`` (which has to reconstruct the series) on each group (rather than ``unique`` which just return an ndarray). This can actually be non-trivial. If you don't need the indexes inside the function then you can often avoid this penalty (by not instantiating the series, which value_counts does, and then gets discarded because all you need is the len of it) – Jeff Jul 29 '13 at 16:41
1

using nunique() seems to be optimal in pandas 0.18.0 – DACW Apr 21 '16 at 14:26

score 21 · Answer 2 · answered Jul 29 '13 at 14:18

In [17]: df
Out[17]: 
    0  1  2
0   1  1  1
1   1  1  1
2   1  1  2
3   1  2  3
4   1  2  3
5   1  2  3
6   2  1  1
7   2  1  2
8   2  1  3
9   2  2  3
10  2  2  3
11  2  2  3

In [19]: df.groupby([0,1])[2].apply(lambda x: len(x.unique()))
Out[19]: 
0  1
1  1    2
   2    1
2  1    3
   2    1
dtype: int64

How to count distinct values in a column of a pandas group by object?

2 Answers2

Linked