Boolean mask for Pandas DataFrame columns

Question

My aim is to use boolean mask to get useful columns from DataFrame.

I try such snippet of code:

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [101, 101, 102, 101, 102], 'c': [23, 12, 54, 65, 21]})
mask = [True, False, True]
df.columns[mask]

And the result is what I actually need:

Index([u'a', u'c'], dtype='object')

Then I try the same code but with another mask:

mask_i = [1, 0, 1]

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [101, 101, 102, 101, 102], 'c': [23, 12, 54, 65, 21]})
mask_i = [1, 0, 1]
df.columns[mask]

I expected the same result, but get all indexes:

Index([u'b', u'a', u'b'], dtype='object')

Then I check:

mask_i = [1, 0, 1]
mask = [True, False, True]
print mask == mask_i`

# Result: `True`

Can somebody explain please why masks are equal but I get different results.

check this question and accepeted answer: http://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-is-it-guarante — milos.ai, Dec 26 '16 at 16:48
__boolean__ indexing != integer indexing. `df.columns[[1, 0, 1]]` will return you a list of columns with the following __indexes__: `[1,0,1]` — MaxU - stand with Ukraine, Dec 26 '16 at 16:55

score 8 · Accepted Answer · answered Dec 26 '16 at 18:34

This is because Pandas uses treats boolean slices as masks, but integer slices as lookups. In your example, you can see that columns[[1, 0, 1]] looks up the second second column, then the first, then the second columns: ["b", "a", "b"].

To convert your integer indexes into booleans, you can use either:

>>> np.array([1, 0, 1]).astype(bool)
array([ True, False,  True], dtype=bool)
>>> map(bool, [1, 0, 1])
[True, False, True]

Boolean mask for Pandas DataFrame columns

1 Answers1