2

Can someone please explain why the following code results in a ValueError?

import heapq
import numpy as np

a = np.ones((2, 2), dtype=int)

states = []
heapq.heappush(states, (0, a))
heapq.heappush(states, (0, a.copy()))

The error message is:

Traceback (most recent call last):
  File "x.py", line 8, in <module>
    heapq.heappush(states, (0, a.copy()))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Running it without adding the a.copy() to the heap works fine, the second/subsequent one is for some reason a problem. I do understand that there is a unknown truth value aspect with an array of [True, False, True] and that it's not possible to determine a single True or False from that, but why does heapq need to do that? Especially only in the second case?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
Tomas
  • 514
  • 1
  • 5
  • 15
  • 1
    heapq needs to compare the heap elements. The heap elements are tuples, and the first entries of the tuples are equal, so it compares the second elements. Comparing the second elements doesn't produce something that can be interpreted as a boolean. – user2357112 Feb 14 '17 at 21:48
  • Note that `heapq.heappush(heap, (x, y))` doesn't mean "push thing `y` with priority `x`"; it means "push thing `(x, y)`". We don't have separate priorities and elements; we just have elements. – user2357112 Feb 14 '17 at 21:51

2 Answers2

5

TL;DR: Because numpy arrays can't be cast to a boolean if they contain more than one element.


Some informations about heaps:

Heaps "order" their contents (so the items must implement < but that's an implementation detail).

You however insert items into the heap by creating tuples for the items, where the first elements is some value and the second one is the array.

Comparing tuples first checks if the first items are equal and if they are it checks if the second items are equal and so on until they are not equal, then it will check if it's smaller (when the operation was <) or greater (for >). However tuples are implemented in C and the == check there is a bit different than the one in Python. It uses PyObject_RichCompareBool. Especially the "note" is important here

If o1 and o2 are the same object, PyObject_RichCompareBool() will always return 1 for Py_EQand 0 for Py_NE.

Now let's go to numpy arrays:

You can't convert a numpy.array to a bool if it contains more than one item:

>>> arr = np.array([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

An if check implicitly converts the condition to a boolean:

>>> if arr: pass
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Even after comparing numpy-arrays they are still numpy arrays:

>>> arr > arr
array([False, False], dtype=bool)
>>> arr == arr
array([ True,  True], dtype=bool)

So these can't be evaluated with ==:

>>> if arr == arr: pass
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

So you can't convert numpy-arrays with more than one element to a boolean! However now comes the interesting part: The heapq-module uses PyObject_RichCompareBool() so it can check if two arrays are equal but if and only if these are identical!

That's why it works with the same array passed in several times but it fails when you copy it:

>>> arr is arr
True
>>> arr is arr.copy()
False
MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • "Heaps are stable and "order" their contents ... meaning that items that compare equal will be in the same order as they were originally" - what are you talking about? heapq doesn't guarantee that at all, and it doesn't do the value-position tuple thing you describe. If you want that behavior, you have to do it yourself. – user2357112 Feb 14 '17 at 22:23
  • @user2357112 Thanks for the comment, I corrected the answer. – MSeifert Feb 14 '17 at 22:55
0

The current answer does not discuss a solution. One workaround is to convert the np.array to a tuple before entering it into the states list processed by heapq. The tuple function only applies to the first dimension of the array, so one quick approach is to first flatten array:

states = []
heapq.heappush(states, (0, tuple(a.flat)))
heapq.heappush(states, (0, tuple(a.flat)))

Or instead, one can preserve all the array dimensions by using (from this answer):

def totuple(a):
    try:
        return tuple(totuple(i) for i in a)
    except TypeError:
        return a
Hugues
  • 2,865
  • 1
  • 27
  • 39