8

My question is "why?:"

aa[0]
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

aaa
array([[405, 162, 414, 0,
        array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],
      dtype=object),
        0, 0, 0]], dtype=object)

np.array_equal(aaa,aa[0])
False

Those arrays are completly identical.

My minimal example doesn't reproduce this:

be=np.array([1],dtype=object)

be
array([1], dtype=object)

ce=np.array([1],dtype=object)

ce
array([1], dtype=object)

np.array_equal(be,ce)
True

Nor does this one:

ce=np.array([np.array([1]),'5'],dtype=object)

be=np.array([np.array([1]),'5'],dtype=object)

np.array_equal(be,ce)
True

However, to reproduce my problem try this:

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

np.array_equal(be,ce)
False

np.array_equal(be[0],ce[0])
False

And I have no idea why those are not equal. And to add the bonus question, how do I compare them?

I need an efficient way to check if aaa is in the stack aa.

I'm not using aaa in aa because of DeprecationWarning: elementwise == comparison failed; this will raise an error in the future. and because it still returns False if anyone is wondering.


What else have I tried?:

np.equal(be,ce)
*** ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

np.all(be,ce)
*** TypeError: only integer scalar arrays can be converted to a scalar index

all(be,ce)
*** TypeError: all() takes exactly one argument (2 given)

all(be==ce)
*** TypeError: 'bool' object is not iterable

np.where(be==ce)
(array([], dtype=int64),)

And these, which I can't get to run in the console, all evaluate to False, some giving the deprecation warning:

import numpy as np

ce=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

be=np.array([[405, 162, 414, 0, np.array([list([1, 9, 2]), 18, (405, 18, 207), 64, 'Universal'],dtype=object),0, 0, 0]], dtype=object)

print(np.any([bee in ce for bee in be]))

print(np.any([bee==cee for bee in be for cee in ce]))

print(np.all([bee in ce for bee in be]))

print(np.all([bee==cee for bee in be for cee in ce]))

And of course other questions telling me this should work...

DonQuiKong
  • 413
  • 4
  • 15
  • Btw: I'm doing this in a recursive function to limit the recursions, so if someone knows how to do this efficiently, maybe even stopping the comparisons when one evaluates True, maybe sth. like *[break if x == aaa for x in aa]* if that's possible. I'd be forever grateful! ;-) – DonQuiKong Oct 09 '18 at 07:19
  • 2
    As explained [here](https://stackoverflow.com/questions/45426587/what-is-going-on-behind-this-numpy-selection-behavior) by @user2357112, *"NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful."* or @juanpa.arrivillaga: "Moral of the story: Don't use `dtype=object` arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays." – keepAlive Oct 09 '18 at 07:26
  • @Kanak if I need to do aa[:,0:3]/np.array([1,9,2]) a list makes those operations hellish. I'm totally fine with another container if you have any suggestion? I don't want to split the informations though, my code i unreadable enough without drawing data that belongs together from x places. – DonQuiKong Oct 09 '18 at 07:31
  • Doing calculations with object dtype array is hit-or-miss. Some things work fine (though not as fast as with numeric dtypes), other things don't. A lot has to do with whether object elements implement the necessary methods. – hpaulj Oct 09 '18 at 07:45
  • @hpaulj what would be a better way? I'm open to any suggestion – DonQuiKong Oct 09 '18 at 08:02

3 Answers3

6

To make an element-wise comparison between the arrays, you can use numpy.equal() with the keyword argument dtype=numpy.object as in :

In [60]: np.equal(be, ce, dtype=np.object)
Out[60]: 
array([[True, True, True, True,
        array([ True,  True,  True,  True,  True]), True, True, True]],
      dtype=object)

P.S. checked using NumPy version 1.15.2 and Python 3.6.6

edit

From the release notes for 1.15,

https://docs.scipy.org/doc/numpy-1.15.1/release.html#comparison-ufuncs-accept-dtype-object-overriding-the-default-bool

Comparison ufuncs accept dtype=object, overriding the default bool

This allows object arrays of symbolic types, which override == and 
other operators to return expressions, to be compared elementwise with 
np.equal(a, b, dtype=object).
hpaulj
  • 221,503
  • 14
  • 230
  • 353
kmario23
  • 57,311
  • 13
  • 161
  • 150
2

To complement @kmario23's answer, what about doing

def wrpr(bools):
    try:
      # ints  = bools.flatten().prod()
        fltn_bools = np.hstack(bools)
    except: # should not pass silently.
        fltn_bools = np.array(wrpr(a) for a in bools)        
    ints = fltn_bools.prod()
    if isinstance(ints, np.ndarray):
        return wrpr(ints)
    return bool(ints)

And finally,

>>> wrpr(np.equal(ce, be, dtype=np.object))
True

Checked using (numpy1.15.1 & Python 3.6.5) & (numpy1.15.1 & Python 2.7.13).


But still, as commented here

NumPy is designed for rigid multidimensional grids of numbers. Trying to get anything but a rigid multidimensional grid is going to be painful. (@user2357112, Jul 31 '17 at 23:10)

and/or

Moral of the story: Don't use dtype=object arrays. They are stunted Python lists, with worse performance characteristics, and numpy is not designed to handle the case of sequence-like containers within these object arrays. (@juanpa.arrivillaga, Jul 31 '17 at 23:38)

keepAlive
  • 6,369
  • 5
  • 24
  • 39
2

The behavior you are seeing is kind of documented here

Deprecations¶

...

Object array equality comparisons

In the future object array comparisons both == and np.equal will not make use of identity checks anymore. For example:

>

a = np.array([np.array([1, 2, 3]), 1])

b = np.array([np.array([1, 2, 3]), 1])

a == b

will consistently return False (and in the future an error) even if the array in a and b was the same object.

The equality operator == will in the future raise errors like np.equal if broadcasting or element comparisons, etc. fails.

Comparison with arr == None will in the future do an elementwise comparison instead of just returning False. Code should be using arr is None.

All of these changes will give Deprecation- or FutureWarnings at this time.

So far, so clear. Or is it?

We can see from @kmario23's answer that as of version 15.2 these changes are not fully implemented yet.

To make matters worse, consider this:

>>> A = np.array([None, a])
>>> A1 = np.array([None, a])
>>> At = np.array([None, a[:2]])
>>> 
>>> A==A1
False
>>> A==At
array([ True, False])
>>> 

Looks like the current behavior is more a coincidence than the result of careful planning.

I suspect it all comes down to whether an exception is raised during element-wise comparison, cf. here and here.

If two corresponding elements of the containing arrays are arrays themselves and of compatible shapes as in A==A1, their comparison yields an array of bools. Trying to cast this to a scalar bool raises an exception. Currently, exceptions are caught and a scalar False is returned.

In the A==At example an exception is raised when the last two elements are compared because their shapes don't broadcast. This is caught and the comparison for this element returns a scalar False which is why comparison of the containing arrays returns a "normal" array of bools.

What about the workarounds suggested by @kmario23 and @Kanak? Do they work?

Well, yes ...

>>> np.equal(A, A1, dtype=object)
array([True, array([ True,  True,  True])], dtype=object)
>>> wrpr(np.equal(A, A1, dtype=object))
True

... and no.

>>> AA = np.array([None, A])
>>> AA1 = np.array([None, A1])
>>> np.equal(AA, AA1, dtype=object)
array([True, False], dtype=object)
>>> wrpr(np.equal(AA, AA1, dtype=object))
False
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99