Evaluating pandas series values with logical expressions and if-statements

Question

I'm having trouble evaluating values from a dictionary using if statements.

Given the following dictionary, which I imported from a dataframe (in case it matters):

>>> pnl[company]
29:   Active Credit       Date   Debit Strike Type
0      1      0 2013-01-08  2.3265  21.15  Put
1      0      0 2012-11-26      40     80  Put
2      0      0 2012-11-26     400     80  Put

I tried to evaluate the following statment to establish the value of the last value of Active:

if pnl[company].tail(1)['Active']==1:
    print 'yay'

However,I was confronted by the following error message:

Traceback (most recent call last):
  File "<pyshell#69>", line 1, in <module>
    if pnl[company].tail(1)['Active']==1:
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This surprised me, given that I could display the value I wanted using the above command without the if statement:

>>> pnl[company].tail(1)['Active']
30: 2    0
Name: Active, dtype: object

Given that the value is clearly zero and the index is 2, I tried the following for a brief sanity check and found that things weren't happening as I might have expected:

>>> if pnl[company]['Active'][2]==0:
...     print 'woo-hoo'
... else:
...     print 'doh'


doh

My Question is:

1) What might be going on here? I suspect I'm misunderstanding dictionaries on some fundamental level.

2) I noticed that as I bring up any given value of this dictionary, the number on the left increases by 1. What does this represent? For example:

>>> pnl[company].tail(1)['Active']
31: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
32: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
33: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
34: 2    0
Name: Active, dtype: object

Thanks in advance for any help.

This is not a question about dictionaries, but about Pandas `Series` objects. — Daniel Roseman, May 04 '14 at 21:03
This seems like something specific to the [pandas](http://pandas.pydata.org/) library that you are using. It appears as though pandas provides objects that sort of act like dictionaries, but differ in important ways. To be clear, you're not dealing with usual Python dictionaries here, but are using a data structure provided by pandas that has dictionary-like syntax. — Greg Hewgill, May 04 '14 at 21:04
You yield a series not a dictionary, as such it cannot evaluate your boolean query, like the error suggest you need to do `pnl[company].tail(1)['Active'].any()==1` even though this is still a single value — EdChum, May 04 '14 at 21:06
With respect to your second question, are you confusing the ordinal output number? so if you just repeatedly did print "yay" or print("yay") (for python 3) does the number still increment — EdChum, May 04 '14 at 21:08
@GregHewgill thanks for the insight. Just so I know, what is the key sign that this is not actually a dictionary? — neanderslob, May 04 '14 at 21:11
@neanderslob: The fact that you're getting errors such as `The truth value of a Series is ambiguous` from within the pandas library source files. — Greg Hewgill, May 04 '14 at 21:12
You could just do `print(type(pnl[company].tail(1)['Active']))` — EdChum, May 04 '14 at 21:12
(Retagged and retitled to add *'pandas Series logical expression'* and remove *'dictionary'*) — smci, May 04 '14 at 21:33
@EdChum With regard to your question about the ordinal output number possibility. I was wondering the same thing but when I just do a `print yay` it doesn't increment. [Here's the output](https://dl.dropboxusercontent.com/u/11993667/incrementoutput.txt) if you're interested. — neanderslob, May 04 '14 at 21:46
Haven't a clue, maybe a pandas print option I've never encountered then — EdChum, May 04 '14 at 21:48
@EdChum Huh, maybe so, no matter though; doesn't seem to be hurting anything. Thanks again for the help. — neanderslob, May 04 '14 at 21:52

score 7 · Accepted Answer · edited Nov 07 '16 at 10:57

7

What you yield is a Pandas Series object and this cannot be evaluated in the manner you are attempting even though it is just a single value you need to change your line to:

if pnl[company].tail(1)['Active'].any()==1:
  print 'yay'

With respect to your second question see my comment.

EDIT

From the comments and link to your output, calling any() fixed the error message but your data is actually strings so the comparison still failed, you could either do:

if pnl[company].tail(1)['Active'].any()=='1':
  print 'yay'

To do a string comparison, or fix the data however it was read or generated.

Or do:

pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)

To convert the dtype of the column so that your comparison is more correct.

edited Nov 07 '16 at 10:57

Rudolf Real

1,948
23
27

answered May 04 '14 at 21:10

EdChum

376,765
198
813
562

1

As always, thanks for your help! I gave the solution a shot and it does indeed get rid of the error. However, I still can't get it to acknowledge the value. Regardless of the value, I can't get it to print 'yay'. [See the output here](https://dl.dropboxusercontent.com/u/11993667/output.txt) Same situation applied to my "sanity check" in the question above. Any idea why this is happening? – neanderslob May 04 '14 at 21:24
1

That looks like a string does `if pnl[company].tail(1)['Active'].any()=='0'` work? do you want it as a string or as a int/float? – EdChum May 04 '14 at 21:26
@EdChum: The `.any()` will return True or False. I'm not sure if comparing the return value of `any` with 1 is what the OP really intends... – unutbu May 04 '14 at 21:27
@unutbu I don't observe that, when I call `df.tail(1)['col'].any()` it returns the value, the comparison will yield a True or False – EdChum May 04 '14 at 21:29
To borrow from the if statement output: "Yay" (the `...=='0'` worked) I actually would prefer to work with it as an int, to answer your previous comment. I'm guessing there's a trick for that? – neanderslob May 04 '14 at 21:32
1

Either fix however you read the data in the first place or do this `pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)` – EdChum May 04 '14 at 21:34
like a charm, much a appreciated – neanderslob May 04 '14 at 21:35
@EdChum: Huh, you are right. Under certain conditions `.any()` can return something other than True or False. – unutbu May 04 '14 at 21:36
@unutbu I've not seen a situation where I would not expect that to be the case, plus if it didn't do that then the error message would not be very helpful as it suggest this as a way of avoiding the ambiguous call – EdChum May 04 '14 at 21:38
I think the usual way would have been: `if (pnl[company].tail(1)['Active']==1).any():`. Maybe it is just me, but I never rely on `any` to return anything but True or False. – unutbu May 04 '14 at 21:40

unutbu · Answer 2 · 2014-05-05T01:36:27.423

A Series is a subclass of NDFrame. The NDFrame.__bool__ method always raises a ValueError. Thus, trying to evaluate a Series in a boolean context raises a ValueError -- even if the Series has but a single value.

The reason why NDFrames have no boolean value (err, that is, always raise a ValueError), is because there is more than one possible criterion that one might reasonably expect for an NDFrame to be True. It could mean

every item in the NDFrame is True, or (if so, use .all())
any item in the NDFrame is True, or (if so, use Series.any())
the NDFrame is not empty (if so, use .empty())

Since either is possible, and since different users have different expectations, instead of just choosing one, the developers refuse to guess and instead require the user of the NDFrame to make explicit what criterion they wish to use.

The error message lists the most likely choices:

Use a.empty, a.bool(), a.item(), a.any() or a.all()

Since in your case you know the Series will contain just one value, you could use item:

if pnl[company].tail(1)['Active'].item() == 1:
    print 'yay'

Regarding your second question: The numbers on the left seem to be line numbering produced by your Python interpreter (PyShell?) -- but that's just my guess.

WARNING: Presumably,

if pnl[company].tail(1)['Active']==1:

means you would like the condition to be True when the single value in the Series equals 1. The code

if pnl[company].tail(1)['Active'].any()==1:
    print 'yay'

will be True if the dtype of the Series is numeric and the value in the Series is any number other than 0. For example, if we take pnl[company].tail(1)['Active'] to be equal to

In [128]: s = pd.Series([2], index=[2])

then

In [129]: s.any()
Out[129]: True

and therefore,

In [130]: s.any()==1
Out[130]: True

I think s.item() == 1 more faithfully preserves your intended meaning:

In [132]: s.item()==1
Out[132]: False

(s == 1).any() would also work, but using any does not express your intention very plainly, since you know the Series will contain only one value.

score 0 · Answer 3 · edited May 23 '17 at 11:53

Your question has nothing to do with Python dictionaries, or native Python at all. It's about pandas Series, and the other answers gave you the correct syntax:

Interpreting your questions in the wider sense, it's about how pandas Series was shoehorned onto NumPy, and NumPy historically until recently had notoriously poor support for logical values and operators. pandas does the best job it can with what NumPy provides. Having to sometimes manually invoke numpy logical functions instead of just writing code with arbitrary (Python) operators is annoying and clunky and sometimes bloats pandas code. Also, you often have to this for performance (numpy better than thunking to and from native Python). But that's the price we pay.

There are many limitations, quirks and gotchas (examples below) - the best advice is to be distrustful of boolean as a first-class-citizen in pandas due to numpy's limitations:

pandas Caveats and Gotchas - Using If/Truth Statements with Pandas
a performance example: Python ~ can be used instead of np.invert() - more legible but 3x slower or worse
some gotchas and limitations: in the code below, note that recent numpy now allows boolean values (internally represented as int) and allows NAs, but that e.g. value_counts() ignores NAs (compare to R's table, which has option 'useNA').

.

import numpy as np
import pandas as pd
s = pd.Series([True, True, False, True, np.NaN])
s2  = pd.Series([True, True, False, True, np.NaN])
dir(s) # look at .all, .any, .bool, .eq, .equals, .invert, .isnull, .value_counts() ...

s.astype(bool) # WRONG: should use the member s.bool ; no parentheses, it's a member, not a function
# 0     True
# 1     True
# 2    False
# 3     True
# 4     True  # <--- should be NA!!
#dtype: bool

s.bool
# <bound method Series.bool of
# 0     True
# 1     True
# 2    False
# 3     True
# 4      NaN
# dtype: object>

# Limitation: value_counts() currently excludes NAs
s.value_counts()
# True     3
# False    1
# dtype: int64
help(s.value_counts) # "... Excludes NA values(!)"

# Equality comparison - vector - fails on NAs, again there's no NA-handling option):
s == s2 # or equivalently, s.eq(s2)
# 0     True
# 1     True
# 2     True
# 3     True
# 4    False  # BUG/LIMITATION: we should be able to choose NA==NA
# dtype: bool

# ...but the scalar equality comparison says they are equal!!
s.equals(s2)
# True

The pandas team are doing a great job on fixes/enhancements, so [report any fixes/enhances/gotchas/docbugs](https://github.com/pydata/pandas/issues) to them. Also, the cookbook and blogs are greatly needed. — smci, May 04 '14 at 22:20
cookbook has been around a while: http://pandas-docs.github.io/pandas-docs-travis/cookbook.html — Jeff, May 05 '14 at 01:56

Evaluating pandas series values with logical expressions and if-statements

3 Answers3

Linked