7

Let a data frame be like the following:

import pandas as pd

df = pd.DataFrame({"name":["A", "A", "B" ,"B", "C", "C"],
                   "nickname":["X","Y","X","Z","Y", "Y"]}

How can I group df and drop those groups (C) that do not contain at least one 'X'?

thank you

dleal
  • 2,244
  • 6
  • 27
  • 49

1 Answers1

14

You can use the grouped by filter from pandas:

df.groupby('name').filter(lambda g: any(g.nickname == 'X')) 

#       name   nickname
# 0        A          X
# 1        A          Y
# 2        B          X
# 3        B          Z
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 1
    thank you Psidom. I didn't know about the "any" function – dleal Jun 27 '16 at 03:06
  • How to drop group if it contains only X – Ankita Patnaik Dec 05 '18 at 09:39
  • 1
    As noted in the followup comment to the answer at https://stackoverflow.com/a/54584371/3108762, `filter` isn't a groupby object so if you want to filter and then have the groups you need another `groupby` at the end of the above command. E.g. `df.groupby('name').filter(lambda g: any(g.nickname == 'X')).groupby('name')` – T. Shaffner Mar 28 '19 at 10:39
  • Also, in my case I had to restructure it more like this to work: `df.groupby('name').filter(lambda g: (g.nickname=='X').any())` Seems like it should be the same to me, maybe an imports issue, but leaving this here for any who follow. – T. Shaffner Mar 28 '19 at 12:36