1

So there's no issue if I do:

A = [[1,2,3],[4,5,6]]
B = [1,2,3]
B in A #=> True

But if I do:

A = [[1,2,3],[4,5,6]]
A = [np.array(x) for x in A]

A[0] in A #=> True

z = np.array([1,2,3])
z in A #=> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I find this very confusing. z and A[0] are both (3,) numpy arrays of the same dtype, and are equal everywhere.

Is there a reason for this kind of behavior?

Thanks.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
user49404
  • 732
  • 6
  • 22

1 Answers1

2

So this is subtle. in will ultimately use == to compare the elements, that will result in a boolean array (with all Trues). However, numpy explicitly prevents arrays from being used in a boolean context... as explained in the error message. So this is one way that numpy.ndarray objets don't play well with vanilla python data structures.

However (and this is an implementation detail) list.__contains__ which will be called by in will use is before == to test membership. Thus it will return True without ever trying to use the array in a boolean context, since

A[0] is A[0]

Although, note, I actually am not sure if it should be considered an implementation detail, because it's right there in the documentation:

For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression x in y is equivalent to any(x is e or x == e for e in y).

Would be interesting to see if other implementations, e.g. pypy or Jython do this as well.

Fundamentally, Python assumes that x is y implies x == y, although, even some built-in types violate this, perhaps most notably, float objects, in particular, float('nan'). Consider:

>>> data = [float('nan'), 1., 2., 3.]
>>> data
[nan, 1.0, 2.0, 3.0]
>>> float('nan') in data
False
>>> data[0] in data
True
>>> data[0] == data[0]
False
>>> data[0] is data[0]
True
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172