2

I found the answer its test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index()

idk why this question was marked as duplicate when the link provided by rafael didnt answer the question

I have a data frame with 3 columns:

   country    color    ID 
0  Germany    Red      12     
1  France     Red      13
2  US         Blue     11
3  France     Red      11

If I want to find out the number of distinct countries and colors per ID in SQL it would be

select  ID
  , count(distinct(country)) as num_countries
  , count(distinct(color)) as num_color
from table_name
group by ID;

The result will look like this

   ID    num_countries   num_color
0  12         1              1   
0  11         2              2   
0  13         1              1 

How do I achieve the same result in Pandas?

Mr_and_Mrs_D
  • 32,208
  • 39
  • 178
  • 361
walter
  • 47
  • 1
  • 1
  • 5
  • @rafaelc OP actually wants `nunique`, which is also likely a dup. Can you find the dup for that? – Quang Hoang Oct 31 '19 at 17:34
  • @rafaelc this link you provided doesnt match my question im trying to count multiple columns seperately and group them by one column – walter Nov 01 '19 at 15:50
  • forgot to paste it but i figure it out .test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index() – walter Nov 11 '19 at 04:42

2 Answers2

6

Use DataFrame.groupby.nunique:

df_unique=df.groupby('ID')['country','color'].nunique().add_prefix('num_').reset_index()
print(df_unique)

   ID  num_country  num_color
0  11            2          2
1  12            1          1
2  13            1          1
ansev
  • 30,322
  • 5
  • 17
  • 31
  • But what if I just want some of the count distincts like group by age_cd but count distinct for act_type and country – walter Oct 31 '19 at 18:10
  • please be more specific, edit your question and propose that it be reopened – ansev Oct 31 '19 at 18:12
  • Hey so I updated the question, Im basically trying to count different columns seperately not together. and group that by one field or more. – walter Nov 01 '19 at 19:51
  • this in nunique: `df.groupby('ID').nunique().drop('ID',axis=1).reset_index()` – ansev Nov 01 '19 at 19:57
  • but that does it for all the columns what if I just wanna count two out of 20 columns by grouping it by the ID – walter Nov 01 '19 at 22:26
  • solution updated: `df_unique=df.groupby('ID')['country','color'].nunique().add_prefix('num_').reset_index()`.please, consider accept my answer – ansev Nov 01 '19 at 22:31
  • didnt see your answer but yeah i figured it out later used this .test2 = test_pd.groupby(by = ['ID'])['country','color'].nunique().reset_index() – walter Nov 11 '19 at 04:43
0

You want nunique():

df.groupby('age_cd', as_index=False).nunique()
Stergios
  • 3,126
  • 6
  • 33
  • 55
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74