19

In a pandas DataFrame, I have a series of boolean values. In order to filter to rows where the boolean is True, I can use: df[df.column_x]

I thought in order to filter to only rows where the column is False, I could use: df[~df.column_x]. I feel like I have done this before, and have seen it as the accepted answer.

However, this fails because ~df.column_x converts the values to integers. See below.

import pandas as pd . # version 0.24.2

a = pd.Series(['a', 'a', 'a', 'a', 'b', 'a', 'b', 'b', 'b', 'b'])
b = pd.Series([True, True, True, True, True, False, False, False, False, False], dtype=bool)

c = pd.DataFrame(data=[a, b]).T
c.columns = ['Classification', 'Boolean']```

print(~c.Boolean)

0    -2
1    -2
2    -2
3    -2
4    -2
5    -1
6    -1
7    -1
8    -1
9    -1
Name: Boolean, dtype: object

print(~b)

0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8     True
9     True
dtype: bool

Basically, I can use c[~b], but not c[~c.Boolean]

Am I just dreaming that this use to work?

K Jones
  • 447
  • 2
  • 15
  • https://stackoverflow.com/questions/21415661/logical-operators-for-boolean-indexing-in-pandas The very last part of the lowest rated comment highlights this problem as well – Chris May 10 '19 at 14:40
  • I think since a boolean is pretty small, the Pythons tend to get entangled with one another if they try to work on the boolean together. – einpoklum May 10 '19 at 21:56

1 Answers1

16

Ah , since you created the c by using DataFrame constructor , then T,

1st let us look at what we have before T:

pd.DataFrame([a, b])
Out[610]: 
      0     1     2     3     4      5      6      7      8      9
0     a     a     a     a     b      a      b      b      b      b
1  True  True  True  True  True  False  False  False  False  False

So pandas will make each columns only have one dtype, if not it will convert to object .

After T what data type we have for each columns

The dtypes in your c :

c.dtypes
Out[608]: 
Classification    object
Boolean           object

Boolean columns became object type , that is why you get unexpected output for ~c.Boolean


How to fix it ? ---concat

c=pd.concat([a,b],1)
c.columns = ['Classification', 'Boolean']
~c.Boolean
Out[616]: 
0    False
1    False
2    False
3    False
4    False
5     True
6     True
7     True
8     True
9     True
Name: Boolean, dtype: bool
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 3
    In my real dataset, the column comes as an object. Based off of WenYoBen's response, I should make my column a boolean dtype. `df.column_x = df.column_x_.astype(bool); df[~df.column_x]` – K Jones May 15 '19 at 19:33