How to apply multiple masks to a dataframe at the same time?

Question

I have set up three masks for my df, and I want to filter out these values.

For example, some random masks:

mask1 = df['column1'].isnull()
mask2 = df['column2'] > 5
mask3 = df['column3'].str.contains('hello')

Now how do I combine these masks to filter out these values? Is this the correct way? Using both ~ and | ?

masked_df = df[~mask1 | ~mask2 | ~mask3]

I have so many rows in my dataframe that I can't be 100% sure with manual checking to see if it's correct.

Looks fine to me. But why not flip the conditions as `df['column1'].notnull()` and `mask2 = df['column2'] < 5`? — yatu, Oct 02 '19 at 10:15
Because my brain doesn't work that way :). I want to create masks like `mask1 = I do not want this.` and `mask2 = I do not want that` etc. — SCool, Oct 02 '19 at 10:19
It dependes if you want to filter out lines who fill all the conditions, and then you are right, or you want to filter out lines who fill any of the conditions, and in that case you should use `&` instead of `|` — Aryerez, Oct 02 '19 at 10:22

score 12 · Answer 1 · answered Oct 02 '19 at 10:22

12

Your solution is nice, but also is posible use bitwise AND and invert chained conditions:

masked_df = df[~(mask1 & mask2 & mask3)]

If masks are in list, solution above is rewritten with np.logical_and.reduce:

masks = [mask1, mask2, mask3]

m = df[~np.logical_and.reduce(masks)]
print (m)
   A  column1  column2 column3
2  c      4.0        9   hello
3  d      5.0        4   hello
4  e      5.0        2   hello
5  f      4.0        3   hello

answered Oct 02 '19 at 10:22

jezrael

822,522
95
1,334
1,252

1

Really nice add-on with the `np.logical_and.reduce()`, this is how I used that in my case where I had a dict containing the masks: `df[~np.logical_and.reduce([a for a in dict.values()])]` – rajan Jan 30 '23 at 22:22

How to apply multiple masks to a dataframe at the same time?

1 Answers1