4
select df.id, count(distinct airports) as num
from df
group by df.id
having count(distinct airports) > 3

I'm trying to do the equivalent of the above in Python pandas. I've tried different combinations of filter, nunique, agg, and nothing is working. Any advice?

ex: df

df   
id     airport
1      lax
1      ohare
2      phl
3      lax
2      mdw
2      lax
2      sfw
2      tpe

So I want the result to be:

id     num
2      5
user319407
  • 173
  • 1
  • 2
  • 4

2 Answers2

3

You can use SeriesGroupBy.nunique with boolean indexing or query:

s = df.groupby('id')['airport'].nunique()
print (s)
id
1    2
2    5
3    1
Name: airport, dtype: int64

df1 = s[s > 3].reset_index()
print (df1)
   id  airport
0   2        5

Or:

df1 = df.groupby('id')['airport'].nunique().reset_index().query('airport > 3')
print (df1)
   id  airport
1   2        5
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

use groupby and count:

df_new = df.groupby('id').count()

filter:

df_new = df_new[(df_new['airport'] > 3)]
dapaz
  • 813
  • 10
  • 16