2

I have a dataset that has a column that looks like this:

NAME
ZZKIDS
ZZZKIDS
ZZZANTHONY

To filter the rows, I know I can use this:

df[~df.NAME.str.contains("ZZ")]

Is there a way to add the other "ZZZ" along with "ZZ"?

Andy
  • 49,085
  • 60
  • 166
  • 233
G. Nguyen
  • 151
  • 3
  • 14

1 Answers1

5

Use the following regex:

df[~df.NAME.str.contains('Z{2,}')]

'Z{2,}' means 2 or more occurences of Z

sacuL
  • 49,704
  • 8
  • 81
  • 106
  • Thanks man. Quick question, would it filter out something like "ZACKZ" since there are 2 or more occurences of Z? – G. Nguyen Aug 08 '18 at 16:36
  • 1
    No, the way I did it would only be looking for consecutive `Z`s. If you wanted to filter out any values where there are more than 2 Z's, regardless of whether they are consecutive, you could use: `df[df.NAME.str.count('Z') > 2]` for example – sacuL Aug 08 '18 at 16:38
  • 3
    @sacul might be the OP only wants `df[~df.NAME.str.startswith('ZZ')]` here... (given the examples...) also - if a string contains at least 2 consecutive Z's... just a check for ZZ is enough as 3 consecutive Z's will also match just two consecutive Z's anyway... – Jon Clements Aug 08 '18 at 16:41