1

My aim is to use boolean mask to get useful columns from DataFrame.

I try such snippet of code:

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [101, 101, 102, 101, 102], 'c': [23, 12, 54, 65, 21]})
mask = [True, False, True]
df.columns[mask]

And the result is what I actually need:

Index([u'a', u'c'], dtype='object')

Then I try the same code but with another mask:

mask_i = [1, 0, 1]

df = pd.DataFrame({'a': [1,2,3,4,5], 'b': [101, 101, 102, 101, 102], 'c': [23, 12, 54, 65, 21]})
mask_i = [1, 0, 1]
df.columns[mask]

I expected the same result, but get all indexes:

Index([u'b', u'a', u'b'], dtype='object')

Then I check:

mask_i = [1, 0, 1]
mask = [True, False, True]
print mask == mask_i`

# Result: `True`

Can somebody explain please why masks are equal but I get different results.

Mahdi
  • 3,188
  • 2
  • 20
  • 33
Gusev Slava
  • 2,136
  • 3
  • 21
  • 26
  • 2
    check this question and accepeted answer: http://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-is-it-guarante – milos.ai Dec 26 '16 at 16:48
  • 2
    __boolean__ indexing != integer indexing. `df.columns[[1, 0, 1]]` will return you a list of columns with the following __indexes__: `[1,0,1]` – MaxU - stand with Ukraine Dec 26 '16 at 16:55

1 Answers1

8

This is because Pandas uses treats boolean slices as masks, but integer slices as lookups. In your example, you can see that columns[[1, 0, 1]] looks up the second second column, then the first, then the second columns: ["b", "a", "b"].

To convert your integer indexes into booleans, you can use either:

>>> np.array([1, 0, 1]).astype(bool)
array([ True, False,  True], dtype=bool)
>>> map(bool, [1, 0, 1])
[True, False, True]
David Wolever
  • 148,955
  • 89
  • 346
  • 502