Here's the set-up for a toy example:
data = [['a', 1],
['b', 2],
['a', 3],
['b', 1],
['c', 2],
['c', 3],
['b', 1]]
colnames = tuple('XY')
df = pd.DataFrame(co.OrderedDict([(colnames[i],
[row[i] for row in data])
for i in range(len(colnames))]))
OK, to get a boolean indicator Series
object (suitable for indexing) corresponding to whether the value in the X
column is equal to 'a'
or not, I can do this:
In [230]: df['X'] == 'a'
Out[230]:
0 True
1 False
2 True
3 False
4 False
5 False
6 False
Name: X, dtype: bool
Fine, but what I really want to do is to test whether the value is one of several possible values. I tried to use set
inclusion for this, but it bombs:
In [231]: df['X'] in set(['a', 'b'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-266-0819ab764ce2> in <module>()
----> 1 df['X'] in set(['a', 'b'])
/Users/yt/.virtualenvs/pd/lib/python2.7/site-packages/pandas/core/generic.pyc in __hash__(self)
639 def __hash__(self):
640 raise TypeError('{0!r} objects are mutable, thus they cannot be'
--> 641 ' hashed'.format(self.__class__.__name__))
642
643 def __iter__(self):
TypeError: 'Series' objects are mutable, thus they cannot be hashed
How can I achieve this?
Note: for the situation I'm working with, the set of allowable values is large, and known only at run time, so an or
expression is out of the question.