0

I am new to python pandas. I build a small function and now I always get the following error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I know that this error was already discussed in other question, however I do not really get what I should do different and how the error occurred.

So this is my simple function:

def relativeWinner():
    if df['GoldSummer'] >0 & df['GoldWinter'] >0:
        df['diff'] = abs(df['GoldSummer'] - df['GoldWinter'])/(df['GoldSummer'] + df['GoldWinter'])
    return df['diff'].idxmax()

Can anyone tell me whats wrong here and how i would fix it?

threxx
  • 1,213
  • 1
  • 31
  • 59
  • What happens if df['GoldSummer'] or df['GoldWinter'] are not greater than 0? You might need an 'else' statement. – scooter me fecit Jan 09 '17 at 21:15
  • You can't compare arrays to generate a scalar value, it should be rewritten like so: `def relativeWinner(): df.loc[(df['GoldSummer'] >0) & (df['GoldWinter'] >0), 'diff'] -= ((df['GoldSummer'] - df['GoldWinter'])/(df['GoldSummer'] + df['GoldWinter'])).abs() return df['diff'].idxmax()`. Besides if your condition is not met it'll just return the index label of the max diff value, is this intended? – EdChum Jan 09 '17 at 21:16
  • 4
    Think about what `df['GoldSummer'] >0 & df['GoldWinter'] >0` returns... it returns a series of booleans: `[True, True, False, True, False]`. You pass this to an if-condition, but `pandas` does not know what you want to consider such an array as...should it be `True` because it has at least one `True` (then use `.any()`), should it be `False` because not all are `True`? (use `all`). Perhaps you want to check if the Series has any values, (use `.empty`) – juanpa.arrivillaga Jan 09 '17 at 21:16
  • @juanpa.arrivillaga where do I put that .any() or .all()? – threxx Jan 09 '17 at 21:19
  • 1
    You could do something like `if (df['GoldSummer'] >0 & df['GoldWinter'] >0).all():` – juanpa.arrivillaga Jan 09 '17 at 21:20
  • @juanpa.arrivillaga so now i added '.all()' as you said, however I still get the same error, do I have to add a else part or why is the error still here? – threxx Jan 09 '17 at 21:23
  • 1
    @threxx Ah! This one bites me a lot: the precedence of the `&` operator messes things up, be explicit: `if ((df['GoldSummer'] >0) & (df['GoldWinter'] >0)).all():` In other words, the bitwise operators (`&`, `|`, `^`) have higher precedence than the comparison operators (`<`, `>`) – juanpa.arrivillaga Jan 09 '17 at 21:25
  • @juanpa.arrivillaga that worked! Thanks. But why do I need to add brackets here ? – threxx Jan 09 '17 at 21:27
  • 1
    Because of operator precedence. `(df['GoldSummer'] >0 & df['GoldWinter'] >0)` ... `0 & df['GoldWinter']` is reduced first (in this case, using a vectorized bitwise and!), leaving you with a comparison with another series! bringing you back the the original problem! – juanpa.arrivillaga Jan 09 '17 at 21:31

1 Answers1

1

As for why this specific issue is occurring, see this post:

Difference between 'and' (boolean) vs. '&' (bitwise) in python. Why difference in behavior with lists vs numpy arrays?

Regarding your code, try this instead:

df['diff'] = [abs(tup[0] - tup[1]) / tup[0] if (tup[0] > 0) and (tup[1] > 0) else 'NaN' for tup in zip(df['GoldSummer'], df['GoldWinter'])]
Community
  • 1
  • 1
chad39
  • 26
  • 3