2

I have a pandas.Series of tuples:

s = pd.Series([('a','b','c'), ('a','c','b'), ('c','a','b')])

and I want to check whether ('a','b','c') is in s (or any other tuple). So:

('a', 'b', 'c') in s

returns False. However,

('a', 'b', 'c') in s.tolist()

returns, as expected, True. Is there an optimized, pandas' way, of doing it? It feels very non-pythonic to convert to a list.

Dror
  • 12,174
  • 21
  • 90
  • 160
  • A Series is like a dictionary. When you use `something in s` you are checking for its index (equivalently, `something in d` checks for dictionary's keys). If you want to check its values, you need to use `something in s.values`. However, it seems numpy doesn't support checking against tuples so it returns False. – ayhan Jul 10 '17 at 10:52

2 Answers2

3

You can use the equal operation to create a bool series then use any() to get the desired result. Or as a more comprehensive approach you can use apply() method of the series objects in order to apply a specific function on all items. Then you can use any() to get the expected result:

In [28]: (s == ('a', 'b', 'c')).any()
Out[28]: True

In [30]: s.apply(('a', 'b', 'c').__eq__).any()
Out[30]: True

Also note that since Series is a One-dimensional ndarray with axis labels the membership checking with in operator will be performed on indices rather than the items.

In [32]: 3 in s
Out[32]: False

In [33]: 2 in s
Out[33]: True

In case you want to change this behavior you might want to override the __contains__ method of the object by creating your own Series type.

In [39]: class MySeries(pd.Series):
    def __init__(self, *args, **kwargs):
        super(MySeries, self).__init__(*args, **kwargs)
    def __contains__(self, arg):
        return (self == arg).any()            
   ....:     

In [40]: ms = MySeries([('a','b','c'), ('a','c','b'), ('c','a','b')])

In [41]: ('a', 'b', 'c') in ms
Out[41]: True

In [42]: 

In [42]: ('a', 'b', 't') in ms
Out[42]: False

You can also make it work for both indices and items:

In [51]: class MySeries(pd.Series):
    def __init__(self, *args, **kwargs):
        super(MySeries, self).__init__(*args, **kwargs)
    def __contains__(self, arg):
        if isinstance(arg, int):
            return super(MySeries, self).__contains__(arg)
        return (self == arg).any()
   ....:     

In [52]: 

In [52]: ms = MySeries([('a','b','c'), ('a','c','b'), ('c','a','b')])

In [53]: 

In [53]: 3 in ms
Out[53]: False

In [54]: 2 in ms
Out[54]: True

In [55]: 

In [55]: ('a', 'b', 'c') in ms
Out[55]: True

In [56]: ('a', 'b', 'd') in ms
Out[56]: False
Omid
  • 2,617
  • 4
  • 28
  • 43
Mazdak
  • 105,000
  • 18
  • 159
  • 188
  • 1
    I find this answer very comprehensive, as it explains the reason why the `in` operator behaves the way it does in this case. – Omid Jul 11 '17 at 15:26
2

You could use an equality check (==) and the any method:

(s == ('a', 'b', 'c')).any()

However that way you lose the short-circuit behavior that in provides.

MSeifert
  • 145,886
  • 38
  • 333
  • 352