239

I am filtering rows in a dataframe by values in two columns.

For some reason the OR operator behaves like I would expect AND operator to behave and vice versa.

My test code:

df = pd.DataFrame({'a': range(5), 'b': range(5) })

# let's insert some -1 values
df['a'][1] = -1
df['b'][1] = -1
df['a'][3] = -1
df['b'][4] = -1

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a != -1) | (df.b != -1)]

print(pd.concat([df, df1, df2], axis=1,
                keys = [ 'original df', 'using AND (&)', 'using OR (|)',]))

And the result:

      original df      using AND (&)      using OR (|)    
             a  b              a   b             a   b
0            0  0              0   0             0   0
1           -1 -1            NaN NaN           NaN NaN
2            2  2              2   2             2   2
3           -1  3            NaN NaN            -1   3
4            4 -1            NaN NaN             4  -1

[5 rows x 6 columns]

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them. I would expect exactly the opposite result. Could anyone explain this behavior?

I am using pandas 0.13.1.

cottontail
  • 10,268
  • 18
  • 50
  • 51
Wojciech Walczak
  • 3,419
  • 2
  • 23
  • 24

5 Answers5

372

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.

DSM
  • 342,061
  • 65
  • 592
  • 494
74

Late answer, but you can also use query(), i.e. :

df_filtered = df.query('a == 4 & b != 2')
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
21

A little mathematical logic theory here:

"NOT a AND NOT b" is the same as "NOT (a OR b)", so:

"a NOT -1 AND b NOT -1" is equivalent of "NOT (a is -1 OR b is -1)", which is opposite (Complement) of "(a is -1 OR b is -1)".

So if you want exact opposite result, df1 and df2 should be as below:

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a == -1) | (df.b == -1)]
automa7
  • 494
  • 4
  • 15
Jake
  • 1,550
  • 1
  • 11
  • 12
1

You can try the following:

df1 = df[(df['a'] != -1) & (df['b'] != -1)]       
rachwa
  • 1,805
  • 1
  • 14
  • 17
  • This is basically the same as the [accepted answer](https://stackoverflow.com/a/22591267/18145256). – rachwa Aug 14 '22 at 15:57
  • You've re-posted one line of the accepted solution 2 years later. Please don't do that - it doesn't add anything to the discussion. – NineTails Aug 17 '22 at 03:23
1

By de Morgan's laws, (i) the negation of a union is the intersection of the negations, and (ii) the negation of an intersection is the union of the negations, i.e.,

A AND B <=> not A OR not B
A OR B  <=> not A AND not B

If the aim is to

drop every row in which at least one value equals -1

you can either use AND operator to identify the rows to keep or use OR operator to identify the rows to drop.

# select rows where both a and b values are not equal to -1
df2_0 = df[df['a'].ne(-1) & df['b'].ne(-1)]

# index of rows where at least one of a or b equals -1
idx = df.index[df.eval('a == -1 or b == -1')]
# drop `idx` rows
df2_1 = df.drop(idx)

df2_0.equals(df2_1) # True

On the other hand, if the aim is to

drop every row in which both values equal -1

you do the exact opposite; either use OR operator to identify the rows to keep or use AND operator to identify the rows to drop.

cottontail
  • 10,268
  • 18
  • 50
  • 51