2

Based on the pandas documentation for query, I do not understand whether it is correct to use and/or or &/| in a query statement with multiple conditions.

Is there a situation when using both bitwise and boolean operators might be necessary? Is there a best practice for when to use which?

The documentation states:

For example, the & and | (bitwise) operators have the precedence of their boolean cousins, and and or, but the practical implications are not clear to me.

I have found accepted answers where users use either bitwise or boolean operators, in this case the answer using query contains and in first case and & in second, getting the same results.

Here is an example where the choice of operator does not change the result:

import numpy as np
import pandas as pd

df = pd.DataFrame(data=np.random.randn(5,2), columns=['A','B'])
df['names'] = list('ABCDE')

query1 = df.query("A > -1 and B < 1 or 'B' in names")

query2 = df.query("A > -1 & B < 1 | 'B' in names")

query1.equals(query2)

Thanks for help.

Dudelstein
  • 383
  • 3
  • 16

1 Answers1

3

All operations return a boolean (true/false) mask, so it doesn't matter whether you use a bitwise or logical operator. However the result is not the same if your numbers are not 0/1 (True/False):

>>> 0 & 1  # same as False & True
0  # False

>>> 0 and 1  # same as False and True
0  # False

>>> 2 & 3
2

>>> 2 and 3
3
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • Thanks for the answer, if you could please expand on 2 points: Can you think of a situation in `query`, where the results would differ? What would you consider better practice for using within query, bitwise or boolean? – Dudelstein May 11 '23 at 10:24
  • The same code can't work outside of `query` (See the [notes](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.query.html)) because at the end, the query is executed row by row so it doesn't matter. I use bitwise operator everytime when I use Pandas to avoid `ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().` – Corralien May 11 '23 at 10:30
  • 1
    `'B' in names` proves the code is executed for each row. The vectorized method is to use `B.isin(names)`. Is it clear? – Corralien May 11 '23 at 10:32