Pandas: Filtering row with multiple string conditions

Question

I have a dataset that has a column that looks like this:

NAME
ZZKIDS
ZZZKIDS
ZZZANTHONY

To filter the rows, I know I can use this:

df[~df.NAME.str.contains("ZZ")]

Is there a way to add the other "ZZZ" along with "ZZ"?

`df[~df.NAME.str.contains("ZZ|ZZZ")]`? – Zero Aug 08 '18 at 16:27 — Zero, Aug 08 '18 at 16:27
Yes, the string can be a regex – joaquin Aug 08 '18 at 16:27 — joaquin, Aug 08 '18 at 16:27

score 5 · Accepted Answer · answered Aug 08 '18 at 16:28

5

Use the following regex:

df[~df.NAME.str.contains('Z{2,}')]

'Z{2,}' means 2 or more occurences of Z

answered Aug 08 '18 at 16:28

sacuL

Thanks man. Quick question, would it filter out something like "ZACKZ" since there are 2 or more occurences of Z? – G. Nguyen Aug 08 '18 at 16:36
1

No, the way I did it would only be looking for consecutive `Z`s. If you wanted to filter out any values where there are more than 2 Z's, regardless of whether they are consecutive, you could use: `df[df.NAME.str.count('Z') > 2]` for example – sacuL Aug 08 '18 at 16:38
3

@sacul might be the OP only wants `df[~df.NAME.str.startswith('ZZ')]` here... (given the examples...) also - if a string contains at least 2 consecutive Z's... just a check for ZZ is enough as 3 consecutive Z's will also match just two consecutive Z's anyway... – Jon Clements Aug 08 '18 at 16:41

1 Answers1