Get the distinct count of one column based on 3 different labels present in another column using pandas

Question

Input:

Type	count
manager	123
manager	123
manager	111
manager	222
tech lead	888
tech lead	888
tech lead	888
tech lead	444
developer	234
developer	567
developer	890

Output: want the distinct count of each label i.e manager,techlead, developer

Type	count
manager	3
tech lead	2
developer	3

What have you tried, and what do you need help with exactly? Like to start, do you know [how to use groupby](https://pandas.pydata.org/docs/user_guide/groupby.html)? For tips, check out [How to ask a good question](/help/how-to-ask). This might also be useful: [How to make good reproducible pandas examples](/q/20109391/4518341). — wjandrea, Mar 12 '23 at 19:14

wjandrea · Answer 1 · 2023-03-12T19:21:09.403

1

You can use groupby with nunique*:

df.groupby('Type', as_index=False, sort=False)['count'].nunique()

        Type  count
0    manager      3
1  tech lead      2
2  developer      3

* link is currently dead; for now use the docs for 1.4 or dev

edited Mar 12 '23 at 19:21

answered Mar 12 '23 at 19:15

wjandrea

28,235
9
60
81

score 0 · Answer 2 · answered Mar 12 '23 at 18:57

To get expected output, you have to drop some duplicates values:

>>> (df.drop_duplicates(['Type', 'count'])
       .value_counts('Type')
       .rename('count').reset_index())

        Type  count
0  developer      3
1    manager      3
2  tech lead      2

>>> (df.drop_duplicates(['Type', 'count'])
       .groupby('Type', as_index=False)['count']
       .count())  # or .nunique(), or .size()

        Type  count
0  developer      3
1    manager      3
2  tech lead      2

Get the distinct count of one column based on 3 different labels present in another column using pandas

2 Answers2