Comparing numpy array of dtype object

Question

My question is "why?:"

aa[0]
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

aaa
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

np.array_equal(aaa,aa[0])
False

Those arrays are completly identical.

My minimal example doesn't reproduce this:

be=np.array([1],dtype=object)

be
array([1], dtype=object)

ce=np.array([1],dtype=object)

ce
array([1], dtype=object)

np.array_equal(be,ce)
True

Nor does this one:

ce=np.array([np.array([1]),'5'],dtype=object)

be=np.array([np.array([1]),'5'],dtype=object)

np.array_equal(be,ce)
True

However, to reproduce my problem try this:

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

np.array_equal(be,ce)
False

np.array_equal(be[0],ce[0])
False

And I have no idea why those are not equal. And to add the bonus question, how do I compare them?

I need an efficient way to check if aaa is in the stack aa.

I'm not using aaa in aa because of DeprecationWarning: elementwise == comparison failed; this will raise an error in the future. and because it still returns False if anyone is wondering.

What else have I tried?:

np.equal(be,ce)
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.all(be,ce)
*** TypeError: only integer scalar arrays can be converted to a scalar index

all(be,ce)
*** TypeError: all() takes exactly one argument (2 given)

all(be==ce)
*** TypeError: 'bool' object is not iterable

np.where(be==ce)
(array([], dtype=int64),)

And these, which I can't get to run in the console, all evaluate to False, some giving the deprecation warning:

import numpy as np

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

print(np.any([bee in ce for bee in be]))

print(np.any([bee==cee for bee in be for cee in ce]))

print(np.all([bee in ce for bee in be]))

print(np.all([bee==cee for bee in be for cee in ce]))

And of course other questions telling me this should work...

Btw: I'm doing this in a recursive function to limit the recursions, so if someone knows how to do this efficiently, maybe even stopping the comparisons when one evaluates True, maybe sth. like *[break if x == aaa for x in aa]* if that's possible. I'd be forever grateful! ;-) — DonQuiKong, Oct 09 '18 at 07:19
As explained [here](https://stackoverflow.com/questions/45426587/what-is-going-on-behind-this-numpy-selection-behavior) by @user2357112, *"NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful."* or @juanpa.arrivillaga: "Moral of the story: Don't use `dtype=object` arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays." — keepAlive, Oct 09 '18 at 07:26
@Kanak if I need to do aa[:,0:3]/np.array([1,9,2]) a list makes those operations hellish. I'm totally fine with another container if you have any suggestion? I don't want to split the informations though, my code i unreadable enough without drawing data that belongs together from x places. — DonQuiKong, Oct 09 '18 at 07:31
Doing calculations with object dtype array is hit-or-miss. Some things work fine (though not as fast as with numeric dtypes), other things don't. A lot has to do with whether object elements implement the necessary methods. — hpaulj, Oct 09 '18 at 07:45
@hpaulj what would be a better way? I'm open to any suggestion — DonQuiKong, Oct 09 '18 at 08:02

score 6 · Accepted Answer · edited Oct 09 '18 at 07:57

6

To make an element-wise comparison between the arrays, you can use numpy.equal() with the keyword argument dtype=numpy.object as in :

In [60]: np.equal(be, ce, dtype=np.object)
Out[60]: 
array([[True, True, True, True,
        array([ True,  True,  True,  True,  True]), True, True, True]],
      dtype=object)

P.S. checked using NumPy version 1.15.2 and Python 3.6.6

edit

From the release notes for 1.15,

https://docs.scipy.org/doc/numpy-1.15.1/release.html#comparison-ufuncs-accept-dtype-object-overriding-the-default-bool

Comparison ufuncs accept dtype=object, overriding the default bool

This allows object arrays of symbolic types, which override == and 
other operators to return expressions, to be compared elementwise with 
np.equal(a, b, dtype=object).

edited Oct 09 '18 at 07:57

hpaulj

221,503
14
230
353

answered Oct 09 '18 at 07:28

kmario23

57,311
13
161
150

I tried your code and it gives me "No loop matching the specified signature and casting was found for ufunc equal" error. Perhaps something wrong with dtype? – Saket Kumar Singh Oct 09 '18 at 07:35
I'm getting the same error. (now that I moved this out of a misplaced try: statement) – DonQuiKong Oct 09 '18 at 07:41
Maybe a python 2 / 3 thing? – DonQuiKong Oct 09 '18 at 07:44
It could be a version thing. It works for me with 3.6 and 1.15.1. – hpaulj Oct 09 '18 at 07:47
@hpaulj it doesn't work for me with python 3.6 and numpy 1.14.5+ – DonQuiKong Oct 09 '18 at 07:49
1

I can confirm, **upgrading numpy solves the issue**. only change: upgrading numpy from 1.14.5 to 1.15.2 and now it evaluates to True. Thank you! – DonQuiKong Oct 09 '18 at 07:59

keepAlive · Answer 2 · 2019-07-12T15:25:14.387

2

To complement @kmario23's answer, what about doing

def wrpr(bools):
    try:
      # ints  = bools.flatten().prod()
        fltn_bools = np.hstack(bools)
    except: # should not pass silently.
        fltn_bools = np.array(wrpr(a) for a in bools)        
    ints = fltn_bools.prod()
    if isinstance(ints, np.ndarray):
        return wrpr(ints)
    return bool(ints)

And finally,

>>> wrpr(np.equal(ce, be, dtype=np.object))
True

Checked using (numpy1.15.1 & Python 3.6.5) & (numpy1.15.1 & Python 2.7.13).

But still, as commented here

NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful. (@user2357112, Jul 31 '17 at 23:10)

and/or

Moral of the story: Don't use dtype=object arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays. (@juanpa.arrivillaga, Jul 31 '17 at 23:38)

edited Jul 12 '19 at 15:25

answered Oct 09 '18 at 07:46

keepAlive

6,369
5
24
39

"TypeError: No loop matching the specified signature and casting was found for ufunc equal" – DonQuiKong Oct 09 '18 at 07:52
@DonQuiKong What does `np.__version__` return? – keepAlive Oct 09 '18 at 07:57
1

I just tried it and upgrading numpy resolves the issue. (from 1.14.5 to 1.15.2) – DonQuiKong Oct 09 '18 at 08:00
What about `wrpr(np.array([np.array([True, True]), np.array([True, True, True])]))` ? I am getting _ValueError: operands could not be broadcast together with shapes (2,) (3,)_ – Hlib Babii Jul 12 '19 at 12:40
Indeed @HlibBabii. See the new `wrpr`'s definition, which may suit you, while still doing the same job as before. – keepAlive Jul 12 '19 at 14:10
@keepAlive this still won't work for `np.array([np.array([[True, True]]), np.array([True, True, True])]`. Indeed painful :) – Hlib Babii Jul 12 '19 at 14:57
1

@HlibBabii The last version of `wrpr` works. ***Tested*** with your two examples. – keepAlive Jul 14 '19 at 09:00

Paul Panzer · Answer 3 · 2019-07-13T00:27:45.720

The behavior you are seeing is kind of documented here

Deprecations¶

...

Object array equality comparisons

In the future object array comparisons both == and np.equal will not make use of identity checks anymore. For example:

>

a = np.array([np.array([1, 2, 3]), 1])

b = np.array([np.array([1, 2, 3]), 1])

a == b

will consistently return False (and in the future an error) even if the array in a and b was the same object.

The equality operator == will in the future raise errors like np.equal if broadcasting or element comparisons, etc. fails.

Comparison with arr == None will in the future do an elementwise comparison instead of just returning False. Code should be using arr is None.

All of these changes will give Deprecation- or FutureWarnings at this time.

So far, so clear. Or is it?

We can see from @kmario23's answer that as of version 15.2 these changes are not fully implemented yet.

To make matters worse, consider this:

>>> A = np.array([None, a])
>>> A1 = np.array([None, a])
>>> At = np.array([None, a[:2]])
>>> 
>>> A==A1
False
>>> A==At
array([ True, False])
>>>

Looks like the current behavior is more a coincidence than the result of careful planning.

I suspect it all comes down to whether an exception is raised during element-wise comparison, cf. here and here.

If two corresponding elements of the containing arrays are arrays themselves and of compatible shapes as in A==A1, their comparison yields an array of bools. Trying to cast this to a scalar bool raises an exception. Currently, exceptions are caught and a scalar False is returned.

In the A==At example an exception is raised when the last two elements are compared because their shapes don't broadcast. This is caught and the comparison for this element returns a scalar False which is why comparison of the containing arrays returns a "normal" array of bools.

What about the workarounds suggested by @kmario23 and @Kanak? Do they work?

Well, yes ...

>>> np.equal(A, A1, dtype=object)
array([True, array([ True,  True,  True])], dtype=object)
>>> wrpr(np.equal(A, A1, dtype=object))
True

... and no.

>>> AA = np.array([None, A])
>>> AA1 = np.array([None, A1])
>>> np.equal(AA, AA1, dtype=object)
array([True, False], dtype=object)
>>> wrpr(np.equal(AA, AA1, dtype=object))
False

Comparing numpy array of dtype object

However, to reproduce my problem try this:

What else have I tried?:

3 Answers3

edit

Linked