Test for containment with a series of tuples

Question

I have a pandas.Series of tuples:

s = pd.Series([('a','b','c'), ('a','c','b'), ('c','a','b')])

and I want to check whether ('a','b','c') is in s (or any other tuple). So:

('a', 'b', 'c') in s

returns False. However,

('a', 'b', 'c') in s.tolist()

returns, as expected, True. Is there an optimized, pandas' way, of doing it? It feels very non-pythonic to convert to a list.

A Series is like a dictionary. When you use `something in s` you are checking for its index (equivalently, `something in d` checks for dictionary's keys). If you want to check its values, you need to use `something in s.values`. However, it seems numpy doesn't support checking against tuples so it returns False. — ayhan, Jul 10 '17 at 10:52

score 3 · Answer 1 · edited Jul 11 '17 at 17:17

You can use the equal operation to create a bool series then use any() to get the desired result. Or as a more comprehensive approach you can use apply() method of the series objects in order to apply a specific function on all items. Then you can use any() to get the expected result:

In [28]: (s == ('a', 'b', 'c')).any()
Out[28]: True

In [30]: s.apply(('a', 'b', 'c').__eq__).any()
Out[30]: True

Also note that since Series is a One-dimensional ndarray with axis labels the membership checking with in operator will be performed on indices rather than the items.

In [32]: 3 in s
Out[32]: False

In [33]: 2 in s
Out[33]: True

In case you want to change this behavior you might want to override the __contains__ method of the object by creating your own Series type.

In [39]: class MySeries(pd.Series):
    def __init__(self, *args, **kwargs):
        super(MySeries, self).__init__(*args, **kwargs)
    def __contains__(self, arg):
        return (self == arg).any()            
   ....:     

In [40]: ms = MySeries([('a','b','c'), ('a','c','b'), ('c','a','b')])

In [41]: ('a', 'b', 'c') in ms
Out[41]: True

In [42]: 

In [42]: ('a', 'b', 't') in ms
Out[42]: False

You can also make it work for both indices and items:

In [51]: class MySeries(pd.Series):
    def __init__(self, *args, **kwargs):
        super(MySeries, self).__init__(*args, **kwargs)
    def __contains__(self, arg):
        if isinstance(arg, int):
            return super(MySeries, self).__contains__(arg)
        return (self == arg).any()
   ....:     

In [52]: 

In [52]: ms = MySeries([('a','b','c'), ('a','c','b'), ('c','a','b')])

In [53]: 

In [53]: 3 in ms
Out[53]: False

In [54]: 2 in ms
Out[54]: True

In [55]: 

In [55]: ('a', 'b', 'c') in ms
Out[55]: True

In [56]: ('a', 'b', 'd') in ms
Out[56]: False

I find this answer very comprehensive, as it explains the reason why the `in` operator behaves the way it does in this case. — Omid, Jul 11 '17 at 15:26

score 2 · Answer 2 · answered Jul 10 '17 at 10:48

2

You could use an equality check (==) and the any method:

(s == ('a', 'b', 'c')).any()

However that way you lose the short-circuit behavior that in provides.

answered Jul 10 '17 at 10:48

MSeifert

145,886
38
333
352

Test for containment with a series of tuples

2 Answers2