The truth value of a Series is ambiguous python dataframe

Question

I am new to python pandas. I build a small function and now I always get the following error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I know that this error was already discussed in other question, however I do not really get what I should do different and how the error occurred.

So this is my simple function:

def relativeWinner():
    if df['GoldSummer'] >0 & df['GoldWinter'] >0:
        df['diff'] = abs(df['GoldSummer'] - df['GoldWinter'])/(df['GoldSummer'] + df['GoldWinter'])
    return df['diff'].idxmax()

Can anyone tell me whats wrong here and how i would fix it?

What happens if df['GoldSummer'] or df['GoldWinter'] are not greater than 0? You might need an 'else' statement. — scooter me fecit, Jan 09 '17 at 21:15
You can't compare arrays to generate a scalar value, it should be rewritten like so: `def relativeWinner(): df.loc[(df['GoldSummer'] >0) & (df['GoldWinter'] >0), 'diff'] -= ((df['GoldSummer'] - df['GoldWinter'])/(df['GoldSummer'] + df['GoldWinter'])).abs() return df['diff'].idxmax()`. Besides if your condition is not met it'll just return the index label of the max diff value, is this intended? — EdChum, Jan 09 '17 at 21:16
Think about what `df['GoldSummer'] >0 & df['GoldWinter'] >0` returns... it returns a series of booleans: `[True, True, False, True, False]`. You pass this to an if-condition, but `pandas` does not know what you want to consider such an array as...should it be `True` because it has at least one `True` (then use `.any()`), should it be `False` because not all are `True`? (use `all`). Perhaps you want to check if the Series has any values, (use `.empty`) — juanpa.arrivillaga, Jan 09 '17 at 21:16
You could do something like `if (df['GoldSummer'] >0 & df['GoldWinter'] >0).all():` — juanpa.arrivillaga, Jan 09 '17 at 21:20
@juanpa.arrivillaga so now i added '.all()' as you said, however I still get the same error, do I have to add a else part or why is the error still here? — threxx, Jan 09 '17 at 21:23
@threxx Ah! This one bites me a lot: the precedence of the `&` operator messes things up, be explicit: `if ((df['GoldSummer'] >0) & (df['GoldWinter'] >0)).all():` In other words, the bitwise operators (`&`, `|`, `^`) have higher precedence than the comparison operators (`<`, `>`) — juanpa.arrivillaga, Jan 09 '17 at 21:25
@juanpa.arrivillaga that worked! Thanks. But why do I need to add brackets here ? — threxx, Jan 09 '17 at 21:27
Because of operator precedence. `(df['GoldSummer'] >0 & df['GoldWinter'] >0)` ... `0 & df['GoldWinter']` is reduced first (in this case, using a vectorized bitwise and!), leaving you with a comparison with another series! bringing you back the the original problem! — juanpa.arrivillaga, Jan 09 '17 at 21:31

score 1 · Accepted Answer · edited May 23 '17 at 12:24

1

As for why this specific issue is occurring, see this post:

Difference between 'and' (boolean) vs. '&' (bitwise) in python. Why difference in behavior with lists vs numpy arrays?

Regarding your code, try this instead:

df['diff'] = [abs(tup[0] - tup[1]) / tup[0] if (tup[0] > 0) and (tup[1] > 0) else 'NaN' for tup in zip(df['GoldSummer'], df['GoldWinter'])]

edited May 23 '17 at 12:24

Community

1
1

answered Jan 09 '17 at 22:21

chad39

26
3

The truth value of a Series is ambiguous python dataframe

1 Answers1