Pandas count distinct multiple columns in a dataframe and group by multiple columns

Question

I found the answer its test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index()

idk why this question was marked as duplicate when the link provided by rafael didnt answer the question

I have a data frame with 3 columns:

   country    color    ID 
0  Germany    Red      12     
1  France     Red      13
2  US         Blue     11
3  France     Red      11

If I want to find out the number of distinct countries and colors per ID in SQL it would be

select  ID
  , count(distinct(country)) as num_countries
  , count(distinct(color)) as num_color
from table_name
group by ID;

The result will look like this

   ID    num_countries   num_color
0  12         1              1   
0  11         2              2   
0  13         1              1

How do I achieve the same result in Pandas?

@rafaelc OP actually wants `nunique`, which is also likely a dup. Can you find the dup for that? — Quang Hoang, Oct 31 '19 at 17:34
@rafaelc this link you provided doesnt match my question im trying to count multiple columns seperately and group them by one column — walter, Nov 01 '19 at 15:50
forgot to paste it but i figure it out .test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index() — walter, Nov 11 '19 at 04:42

ansev · Accepted Answer · 2019-11-01T22:30:28.897

6

Use DataFrame.groupby.nunique:

df_unique=df.groupby('ID')['country','color'].nunique().add_prefix('num_').reset_index()
print(df_unique)

   ID  num_country  num_color
0  11            2          2
1  12            1          1
2  13            1          1

edited Nov 01 '19 at 22:30

answered Oct 31 '19 at 17:30

ansev

30,322
5
17
31

But what if I just want some of the count distincts like group by age_cd but count distinct for act_type and country – walter Oct 31 '19 at 18:10
please be more specific, edit your question and propose that it be reopened – ansev Oct 31 '19 at 18:12
Hey so I updated the question, Im basically trying to count different columns seperately not together. and group that by one field or more. – walter Nov 01 '19 at 19:51
this in nunique: `df.groupby('ID').nunique().drop('ID',axis=1).reset_index()` – ansev Nov 01 '19 at 19:57
but that does it for all the columns what if I just wanna count two out of 20 columns by grouping it by the ID – walter Nov 01 '19 at 22:26
solution updated: `df_unique=df.groupby('ID')['country','color'].nunique().add_prefix('num_').reset_index()`.please, consider accept my answer – ansev Nov 01 '19 at 22:31
didnt see your answer but yeah i figured it out later used this .test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index() – walter Nov 11 '19 at 04:43

score 0 · Answer 2 · edited Jun 23 '22 at 07:52

0

You want nunique():

df.groupby('age_cd', as_index=False).nunique()

edited Jun 23 '22 at 07:52

Stergios

3,126
6
33
55

answered Oct 31 '19 at 17:32

Quang Hoang

146,074
10
56
74

Pandas count distinct multiple columns in a dataframe and group by multiple columns

2 Answers2