6

I have set up three masks for my df, and I want to filter out these values.

For example, some random masks:

mask1 = df['column1'].isnull()
mask2 = df['column2'] > 5
mask3 = df['column3'].str.contains('hello')

Now how do I combine these masks to filter out these values? Is this the correct way? Using both ~ and | ?

masked_df = df[~mask1 | ~mask2 | ~mask3]

I have so many rows in my dataframe that I can't be 100% sure with manual checking to see if it's correct.

SCool
  • 3,104
  • 4
  • 21
  • 49
  • 1
    Looks fine to me. But why not flip the conditions as `df['column1'].notnull()` and `mask2 = df['column2'] < 5`? – yatu Oct 02 '19 at 10:15
  • Because my brain doesn't work that way :). I want to create masks like `mask1 = I do not want this.` and `mask2 = I do not want that` etc. – SCool Oct 02 '19 at 10:19
  • It dependes if you want to filter out lines who fill all the conditions, and then you are right, or you want to filter out lines who fill any of the conditions, and in that case you should use `&` instead of `|` – Aryerez Oct 02 '19 at 10:22

1 Answers1

12

Your solution is nice, but also is posible use bitwise AND and invert chained conditions:

masked_df = df[~(mask1 & mask2 & mask3)]

If masks are in list, solution above is rewritten with np.logical_and.reduce:

masks = [mask1, mask2, mask3]

m = df[~np.logical_and.reduce(masks)]
print (m)
   A  column1  column2 column3
2  c      4.0        9   hello
3  d      5.0        4   hello
4  e      5.0        2   hello
5  f      4.0        3   hello
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    Really nice add-on with the `np.logical_and.reduce()`, this is how I used that in my case where I had a dict containing the masks: `df[~np.logical_and.reduce([a for a in dict.values()])]` – rajan Jan 30 '23 at 22:22