34

The following code will print True because the Series contains at least one element that is greater than 1. However, it seems a bit un-Pythonic. Is there a more Pythonic way to return True if a Series contains a number that is greater than a particular value?

import pandas as pd

s = pd.Series([0.5, 2])
print True in (s > 1)          # True

Not only is the above answer un-Pythonic, it will sometimes return an incorrect result for some reason. For example:

s = pd.Series([0.5])
print True in (s < 1)          # False
cottontail
  • 10,268
  • 18
  • 50
  • 51
ChaimG
  • 7,024
  • 4
  • 38
  • 46

2 Answers2

50

You could use any method to check if that condition is True at least for the one value:

In [36]: (s > 1).any()
Out[36]: True
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
  • How do you extend that operation to a set of columns so that it returns if there is at least one value greater than zero among all values? – Federico Gentile Oct 31 '17 at 15:59
  • @FedericoGentile do you mean something like `any(axis=1).any()`? First, it'll be checked across all rows in your subset and will produce the Pandas Series. Second, you'll check series for any `True` values. If not you could provide an example in the comment or maybe better to ask a new question with all details. – Anton Protopopov Nov 01 '17 at 05:03
  • I meant if I have a dataframe with 3 columns (A, B, C) and I want to check if there is at least a value grater than 0 in column A and B... one possible solution is to do this: (df.A > 1).any() and (df.B > 1).any(). Is there a nicer and elegant way to do it? – Federico Gentile Nov 01 '17 at 07:56
  • 3
    @FedericoGentile you could use something like `(df[['A', 'B', 'C']] > 1).any(axis=1)` – Anton Protopopov Nov 01 '17 at 11:30
0

in operator a.k.a __contains__() method checks if a specific value exists as an index in a Series.

s = pd.Series([0.5], index=['a'])

'a' in (s > 1)          # True
'b' in s                # False

As a side note, in operator used on dataframes checks if a value exists as a column label.

df = pd.DataFrame([[1]], columns=['a'])
'a' in df               # True
'b' in df               # False

In other words, the fact that the in operator returns True or False has nothing to do with whether (s > 1) has any True values in it or not. In order to make the membership test work, the values must be accessed.

True in (s < 1).values  # True

Reducing the values into a single boolean value (as suggested by @Anton Protopopov) is the canonical way to this task. Python's built-in any() function may be called as well.

any(s > 1)              # False
s.gt(1).any()           # False

(s < 1).any()           # True
s.lt(1).any()           # True
cottontail
  • 10,268
  • 18
  • 50
  • 51