This is a continuation of the method used in this question.
Say we have a dataframe
Make Model Year HP Cylinders Transmission MPG-H MPG-C Price
0 BMW 1 Series M 2011 335.0 6.0 MANUAL 26 19 46135
1 BMW 1 Series 2011 300.0 6.0 MANUAL 28 19 40650
2 BMW 1 Series 2011 300.0 6.0 MANUAL 28 20 36350
3 BMW 1 Series 2011 230.0 6.0 MANUAL 28 18 29450
4 BMW 1 Series 2011 230.0 6.0 MANUAL 28 18 34500
...
Using the interquartile range (IQR) (i.e middle 50%), I created 2 variables, upper
and lower
. The specific calculation isn't important in this discussion, but to give an example of upper
:
Year 2029.50
HP 498.00
Cylinders 9.00
MPG-H 42.00
MPG-C 31.00
Price 75291.25
As expected, it only calculates values for columns that have int64 values.
When I want to filter out values that lie outside of the IQR,
correct_df = df[~((df < lower) |(df > upper)).any(axis=1)]
it gives me the right answer. However, when I invert the logic to use &
instead of |
, I get an empty dataframe. Here is the code:
another_df = df[((df >= lower) & (df <= upper)).all(axis=1)]
Which gives the results, but can be fixed by converting the index of upper
/lower
into a list ('lst'):
Make Model Year HP Cylinders Transmission Drive Mode MPG-H MPG-C Price
----------------------------------------------------------------------------------------------
another_df = df[((df[lst] >= lower) & (df[lst] <= upper)).all(axis=1)]
It seems like &
and |
behave differently for non-numerical columns? Why does that happen?