9

Assuming I've the following pandas.Series:

import pandas as pd
s = pd.Series([1,3,5,True,6,8,'findme', False])

I can use the in operator to find any of the integers or Booleans. Examples, the following all yield True:

1 in s
True in s

However, this fails when I do:

'findme' in s

My workaround is to use pandas.Series.str or to first convert the Series to a list and then use the in operator:

True in s.str.contains('findme')
s2 = s.tolist()
'findme' in s2

Any idea why I can't directly use the in operator to find a string in a Series?

sedeh
  • 7,083
  • 6
  • 48
  • 65
  • `6 in pd.Series([1,2,6])` is False, so even numerical values are not working as you expect :) – flow2k Nov 30 '20 at 09:51

2 Answers2

17

Any idea why I can't directly use the in operator to find a string in a Series?

Think of a Series more like an ordered dictionary than a list-- membership testing in a Series is of the index (like keys in a dictionary), not of the values. You could access the values via under the .values attribute:

>>> s = pd.Series([1,3,5,True,6,8,'findme', False])
>>> 7 in s
True
>>> 7 in s.values
False
>>> 'findme' in s
False
>>> 'findme' in s.values
True
DSM
  • 342,061
  • 65
  • 592
  • 494
  • Interestingly, if I `import numpy as np` and then do `s = pd.Series([1,3,5,True,6,8,'findme', False, np.nan])`, I can't find the `NaN` by doing `np.nan in s.values` but I can find it by doing `np.nan in s.tolist()`. Thoughts? – sedeh Oct 29 '15 at 13:26
  • @sedeh: `nan` is a weird one because `nan != nan`, so in general you can only get `nan in (something_which_contains_nan)` if it's IDENTICAL, and `tolist()` reuses `np.nan`. See [here](http://stackoverflow.com/questions/20320022/why-in-numpy-nan-nan-is-false-while-nan-in-nan-is-true) for a previous answer of mine on nan-ish stuff. – DSM Oct 29 '15 at 14:36
1

The function you're looking for is Series.str.match().

s.str.match('findme').any()

Note this is doing a regex match, so it's very extensible. (If you don't need an exact match, you can use Series.str.contains().)

.any() collapses the true-false series to a single value, as desired by the questioner.

Alternatively, you can use the more general method Series.isin() for exact matches.

s.isin(['findme']).any()

(Note that you have to wrap 'findme' in brackets; isin() requires list-likes.)

In the comments, there was the question about finding np.NaN. The above code works for that example as well.

s = pd.Series([1,3,5,True,6,8,'findme', False, np.NaN])

s.isin([np.NaN]).any()

You can alternatively use the specific Series.isna() method, which is equivalent.

s.isna().any()

The advantage of s.isin() is that it's agnostic to datatypes if you're looking to match on multiple possibles:

s.isin(['findme', np.NaN]).any()
climatebrad
  • 1,286
  • 8
  • 13