11

I am trying to use a Boolean mask to get a match from 2 different dataframes. U

Using the logical OR operator:

x = df[(df['A'].isin(df2['B']))
      or df['A'].isin(df2['C'])]

Output:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

However using the bitwise OR operator, the results are returned successfully.

x = df[(df['A'].isin(df2['B']))
      | df['A'].isin(df2['C'])]

Output: x

Is there a difference in both and would bitwise OR be the best option here? Why doesn't the logical OR work?

BernardL
  • 5,162
  • 7
  • 28
  • 47
  • 4
    Yes, it is basically because logical or cannot be overloaded. – ayhan Sep 08 '16 at 10:50
  • Hi edited my question. I am just really curious on why logical ORs do not work. – BernardL Sep 08 '16 at 10:50
  • 1
    You're comparing arrays, not scalar values which `or` doesn't understand, so need to use bitwise `|`. – jezrael Sep 08 '16 at 10:59
  • Thanks. I probably should read up more on the basic functions. – BernardL Sep 08 '16 at 11:02
  • 1
    But better it is explain [here](http://stackoverflow.com/a/10063039/2901002) with `and`, but same works with `or`. – jezrael Sep 08 '16 at 11:03
  • Possible duplicate of [ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()](http://stackoverflow.com/questions/10062954/valueerror-the-truth-value-of-an-array-with-more-than-one-element-is-ambiguous) – Zeugma Sep 08 '16 at 11:33

1 Answers1

26

As far as I have come to understand this issue (coming from a C++ background and currently learning Python for data sciences) I stumbled upon several posts suggesting that bitwise operators (&, |) can be overloaded in classes, just like C++ does.

So basically, while you may use such bitwise operators on numbers they will compare the bits and give you the result. So for instance, if you have the following:

1 | 2 # will result in 3

What Python will actually do is compare the bits of these numbers:

00000001 | 00000010

The result will be:

00000011 (because 0 | 0 is False, ergo 0; and 0 | 1 is True, ergo 1)

As an integer: 3

It compares each bit of the numbers and spit out the result of these eight consecutive operations. This is the normal behaviour of these operators.

Enter Pandas. As you can overload these operators, Pandas has made use of this. So what bitwise operators do when coming to pandas dataframes, is the following:

(dataframe1['column'] == "expression") & (dataframe1['column'] != "another expression)

In this case, first pandas will create a series of trues or falses depending on the result of the == and != operations (be careful: you have to put braces around the outer expressions because python will always try to resolve first bitwise operators and THEN the other comparision operators!!). So it will compare each value in the column to the expression and either output a true or a false.

Then you'd have two same-length series of trues and falses. What it THEN does is take these two serieses and basically compare them with either "and" (&) or "or" (|), and finally spit out one single series either fulfilling or not fulfilling all three comparision operations.

To go even further, what I think is happening under the hood is that the &-operator actually calls a function of pandas, gives them both previously evaluated operations (so the two serieses to the left and right of the operator) and pandas then compares two distinct values at a time, returning a True or False depending on the internal mechanism to determine this.

This is basically the same principle they've used for all other operators as well (>, <, >=, <=, ==, !=).

Why do the struggle and use a different &-expression when you got the nice and neat "and"? Well, that seems to be because "and" is just hard coded and cannot be altered manually.

Hope that helps!

nathan_lesage
  • 276
  • 4
  • 3
  • 2
    Thanks! Sorry on the late reply, the explanation clearly breaks it down. Gives an understanding on whats happening under the hood. – BernardL Oct 04 '17 at 06:05
  • 1
    Great, clear explanation. Even though I've known how this works for a long time, I never bothered to think through the 'why'. I usually include extraneous parens for my own logic/notes, so it might have occurred to me sooner if i didn't do that. – Jeff Ellen Jul 29 '18 at 03:02
  • Turns out I needed `()`.. Thanks. – WillZ Dec 19 '18 at 21:44
  • 1
    I've just recently started on a pandas project and that has been niggling at me. Thanks for the explanation, bookmarked – David Clarke Nov 24 '22 at 19:29