Why does comparing two tuples each containing a NumPy object complain about truth?

Question

Suppose I have two NumPy arrays:

>>> import numpy as np
>>> a = np.arange(2)
>>> b = np.arange(2)

They can be compared without raising an exception, though the result is, as expected, not a single value:

>>> a > b
array([False, False], dtype=bool)

However, putting them in a tuple comparison that requires comparing them does raise an exception:

>>> (1, a) > (1, b)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

A similar happens with Pandas Series objects; in that case, __nonzero__ is called. The Python documentation says that method is for converting to bool, which seems not relevant here.

There is another question about how to accomplish the comparison correctly.

But, my question is: Why does this happen? How do booleans get involved? Why is there not a more logical exception about not being able to compare the objects?

This is Python 3.4.

score 4 · Accepted Answer · answered Sep 02 '15 at 01:35

When you compare numpy arrays, you get a boolean numpy array. However, when you compare tuples in Python, it compares the corresponding elements of the tuple to each other and expects to get a boolean value back.

a = np.array([1,2,3])
b = np.array([2,3,4])
c = a < b # np.array([True, True, True], dtype=bool)
bool(c) # raises an exception

Tuple comparison a < b is equivalent to:

for x, y in zip(a,b):
    if x < y: return True
    if x > y: return False
return len(a) < len(b)

The if statements perform an implicit bool() conversion on the result of the element-wise comparisons. In Python 2, calling bool on a object of a custom class is implemented by the class's __nonzero__ method, in Python 3 it's __bool__. The bool(array) call is what's giving you the error message.

hpaulj · Answer 2 · 2015-09-02T02:12:12.583

This

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

usually arises when a general Python expression is doing a logical operation, (and, or etc), and a numpy object returns a boolean array.

An obvious example is:

if np.array([True,False]):1

if demands a simple True/False, but the value is an array.

Your example is a little more complicated in that we have to know how Python performs the > test on tuples (and presumably lists). I think it does an element by element comparison, and then combines the results with and/or.

Regardless of the details, the array elements of the tuples are returning boolean arrays, while the Python logical operation expects scalar booleans. Hence the error message.

Another context that produces this error:

bool([False,False])  # == False
bool(np.array([False,False]))  # this ValueError
bool(np.array([1]))  # True
bool(np.array([]))   # False

bool() applied to 1 or 0 element arrays is ok, othewise it produces this error. I suspect that Python and and or apply bool() to each of their arguments before combining them.

I have found where this ambiguous ValueError is produced in the numpy C code, (in a _array_nonzero function), but haven't been able to trace how a Python bool might end up calling it.

Why does comparing two tuples each containing a NumPy object complain about truth?

2 Answers2