-2

I've seen that if you pass a boolean series to a dataframe of the same length as rows in the dataframe it filters the dataframe. However, if we pass a condition instead of a boolean series (like df['col']==value) and want to perform boolean operations on that condition (like ~ ) it does not work, even though the condition's result is a boolean series. It only works if it is surrounded by parenthesis. In other words, this works df[~(df['col']>value)] and this does not df[~df['col']>value], notice the only difference are the parenthesis

I thought the parenthesis was doing something to the boolean series resulting from applying df['col']>value, like casting it into another kind of object that supports operations such as ~. But it does not, the type(df['col']>value) and type((df['col']>value)) is the same, whcih is "pandas.core.series.Series". So what are those parenthesis doing that enables the boolean series resulting from using the condition?

Moreover, if you have two boolean_series derived from applying conditions to a dataframe, like

series_a=df['col']>value and series_b=df['col']==value and you try to use both of them with an & operator this way df[series_a & series_b] it actually works fine. But calculating them inside the dataframe does not works df[df['col']>value & df['col']==value] , it gives error TypeError: unsupported operand type(s) for &: 'int' and 'IntegerArray' From that error I would assume there is some precedence in the operators taking place since it seems it's trying to apply the & to an IntegerArray, probably doing this: df['col']> (value & df['col']) ==value But I would like to ask to confirm

Example: Supposing we have some dataframe with column tag that has either values A or B

import pandas as pd
import numpy as np
import random

df=pd.DataFrame({'tag'=[random.choice['A','B' for i in range(100)]}

If I try to filter doing this:

df[~(df['tag']=='A')]

It works, but If I do this without those parenthesis it does not works with this error TypeError: bad operand type for unary ~: 'str'

df[~df['tag']=='A']
Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 2
    `~df['tag']=='A'` is equivalent to `(~df['tag'])=='A'` – tkausl Aug 02 '23 at 16:18
  • `df=pd.DataFrame({'tag'=[random.choice['A','B' for i in range(100)]}` has lots of syntax and logic problems. Please show what you actually used. – Barmar Aug 02 '23 at 16:20
  • It's because Python's bitwise operators have higher precedence than Python's comparison operators. See https://stackoverflow.com/questions/42338005/pandas-logical-and-operator-with-and-without-brackets-produces-different-results – Nick ODell Aug 02 '23 at 16:22
  • 2
    You need to check the operator precedence table when you remove parentheses, to understand how the result will be parsed. – Barmar Aug 02 '23 at 16:22

1 Answers1

0

Its a question of Operator precedence. When you provide two operations (~ and >), python has to decide which one to apply first. In

~df['col']>value

~ has higher precedence so it goes first. You negated the dataframe and then compared. Its the same as (~(df['col'])) > value.

If you want to compare and then negate, you have to use parentheses to avoid the unwanted order of operations. Expressions inside parens have the highest precedence. In

~(df['col']>value)

the comparison is done first.

tdelaney
  • 73,364
  • 6
  • 83
  • 116