This is an elaboration on MaxNoe's answer since this was to lengthy to include
in the comments.
As he indicated, df[0] is True
evaluates to False
, which is then coerced
to 0
which corresponds to a column name. What is interesting about this is
that if you run
>>>df = pd.DataFrame([True, False, True])
>>>df[False]
KeyError Traceback (most recent call last)
<ipython-input-21-62b48754461f> in <module>()
----> 1 df[False]
>>>df[0]
0 True
1 False
2 True
Name: 0, dtype: bool
>>>df[False]
0 True
1 False
2 True
Name: 0, dtype: bool
This seems a bit perplexing at first (to me at least) but has to do with how
pandas
makes use of caching. If you look at how df[False]
is resolved, it
looks like
/home/matthew/anaconda/lib/python2.7/site-packages/pandas/core/frame.py(1975)__getitem__()
-> return self._getitem_column(key)
/home/matthew/anaconda/lib/python2.7/site-packages/pandas/core/frame.py(1999)_getitem_column()
-> return self._get_item_cache(key)
> /home/matthew/anaconda/lib/python2.7/site-packages/pandas/core/generic.py(1343)_get_item_cache()
-> res = cache.get(item)
Since cache
is just a regular python dict
, after running df[0]
the cache
looks like
>>>cache
{0: 0 True
1 False
2 True
Name: 0, dtype: bool}
so that when we look up False
, python coerces this to 0
. If we have not
already primed the cache using df[0]
, then res
is None
which triggers a
KeyError
on line 1345 of generic.py
def _get_item_cache(self, item):
1341 """Return the cached item, item represents a label indexer."""
1342 cache = self._item_cache
1343 -> res = cache.get(item)
1344 if res is None:
1345 values = self._data.get(item)