Pandas - count distinct values per column

Question

I have a dataframe that looks like this:

Id ActivityId ActivityCode

1   2           3
1   2           4
1   3           2

I need to get a count of the distinct Activity IDs that the Id is related to.

In the example above, id 1 would return 2 since there're 2 distinct activity ids for that id.

The SQL would look this way:

SELECT COUNT(DISTINCT ActivityId) FROM table GROUP BY Id

How do I do this in pandas?

(And if possible, I'd like to know if there's a way to get the result in a dictionary, without iterating manually)

jezrael · Accepted Answer · 2017-06-01T07:58:18.393

10

I think you need groupby with nunique :

print (df)
   Id  ActivityId  ActivityCode
0   1           2             3
1   1           2             4
2   1           3             2
3   2           8             7

df = df.groupby('Id')['ActivityId'].nunique()
print (df)
Id
1    2
2    1
Name: ActivityId, dtype: int64

And for dict add Series.to_dict:

d = df.groupby('Id')['ActivityId'].nunique().to_dict()
print (d)
{1: 2, 2: 1}

edited Jun 01 '17 at 07:58

answered Jun 01 '17 at 07:53

jezrael

822,522
95
1,334
1,252

1

Glad can help, good luck! – jezrael Jun 01 '17 at 08:32

Pandas - count distinct values per column

1 Answers1

Linked

Related