Comparing 2 Structured Arrays that contain values of different types and NaNs

Question

So I have 2 structured Numpy Arrays:

a = numpy.array([('2020-01-04', 'Test', 1, 1.0), 
                 ('2020-01-05', 'Test2', 2, NaN)], 
                  dtype=[('Date', 'M8[D]'), ('Name', 'S8'), ('idx', 'i8'), ('value', 'f8')])
b = numpy.array([('2020-01-04', 'Test', 2, 1.0), 
                 ('2020-01-05', 'Test2', 2, NaN)], 
                dtype=[('Date', 'M8[D]'), ('Name', 'S8'), ('idx', 'i8'), ('value', 'f8')])

I need to compare the 2 arrays and get an array of True/False values that will indicate which indices in the array are different.

Doing something like:

not_same = np.full(shape=a.shape, dtype=bool, fill_value=False)
for field in a.dtype.names:
     not_same = np.logical_or(not_same,
                              a[field] != b[field])

works to a point but comparison of NaN != NaN is actually True, so I would need to use something like np.allclose but you can only do this if the values you're comparing are floating point (Strings blow up).

So I need either one of 2 things either:

Determine that values in the a[field] are floating point or not

or

A method of comparing 2 arrays which will allow comparison of 2 NaN values that will be give you True

Per Request below regarding the error:

dt = np.dtype([('string', 'S10'), ('val', 'f8')])
arr = np.array([('test', 1.0)], dtype=dt)
np.isreal(arr['string'])

Ran on Ubuntu 20.04 with Python 3.8.5

Here's a solution you can use.... https://stackoverflow.com/questions/10710328/comparing-numpy-arrays-containing-nan — Joe Ferndz, Mar 18 '21 at 22:46
So you want `NaN == NaN`? Well, by definition of NaN that's not true. — a_guest, Mar 18 '21 at 22:51
The link I gave you (above) is the closest I could get to a Nan == Nan = True solution. Go through that. I think we can close this question as this was already asked and answered — Joe Ferndz, Mar 18 '21 at 22:59
@JoeFerndz This works to generate a single value not a set of values. Besides I would have hoped that we have come a little further then compare by exception. I've seen this question before I asked mine — Karlson, Mar 18 '21 at 23:08
@a_guest Yes. `np.allclose` has an option of having comparison return `True` but in order to call it you need to know the type of the values in an array. — Karlson, Mar 18 '21 at 23:10
@Karlson Well then, did you check how the realize it in [`isclose`](https://github.com/numpy/numpy/blob/fb215c76967739268de71aa4bda55dd1b062bc2e/numpy/core/numeric.py#L2375)? It's a plain comparison of the two arrays for NaN, so it's not any more efficient than if you did this manually. — a_guest, Mar 18 '21 at 23:13
@a_guest Yes I did. The issue with calling `isnan` is the same as with calling `allclose` or `isclose`. You have to be sure that you're calling it on a type that can is a float or can be coerced into a float. If you try to call this and the field type is a string you will get an exception. The problem is that I get about 5 pairs of arrays that I need to compare and while each pair has the same `dtype` the dtypes between them are different. On top of this only a subset of fields within each dtype is NaN capable. — Karlson, Mar 18 '21 at 23:20
The only workaround I found is `if (isinstance(x, float) and np.isnan(x)) or (isinstance(y, float) and np.isnan(y)): print (False) elif x==y: print (True) else: print (False)` — Joe Ferndz, Mar 19 '21 at 00:37
@Karlson Well then, you have to chain another test for float, e.g. [`np.isreal`](https://numpy.org/doc/stable/reference/generated/numpy.isreal.html). — a_guest, Mar 19 '21 at 06:21
@a_guest `FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison` — Karlson, Mar 19 '21 at 11:49
@Karlson You need to show the full code that produces this error. Please update your question. — a_guest, Mar 19 '21 at 12:30

Joe Ferndz · Answer 1 · 2021-03-19T00:56:35.350

Here's the workaround I was able to use to solve this. This is not pretty but does check for all the elements, compare and provide the answer. You can expand this to find a way to change the answer to True if np.Nan == np.Nan.

import numpy as np
a = np.array([('2020-01-04', 'Test', 1, 1.0), 
                 ('2020-01-05', 'Test2', 2, np.NaN)], 
                  dtype=[('Date', 'M8[D]'), ('Name', 'S8'), ('idx', 'i8'), ('value', 'f8')])
b = np.array([('2020-01-04', 'Test', 2, 1.0), 
                 ('2020-01-05', 'Test2', 2, np.NaN)], 
                dtype=[('Date', 'M8[D]'), ('Name', 'S8'), ('idx', 'i8'), ('value', 'f8')])
idx_ab = []
for i,j in zip(a,b):
    for x,y in zip(i,j):
        if (isinstance(x, float) and np.isnan(x)) or (isinstance(y, float) and np.isnan(y)):   
            idx_ab.append(False)
        elif x == y:
            idx_ab.append(True)
        else:
            idx_ab.append(False)

print (idx_ab)

Output of this will be:

[True, True, False, True, True, True, True, False]

Unfortunately, you cannot just check if np.isnan(x,y). Both x and y have to be float if you want to check. If it is a string, it will give you an error. So you need to check isinstance first before you check for nan.

The alternate way is to use the np.isclose() or np.allclose() option that i shared the link to:

comparing numpy arrays containing NaN

If you want to check each element in a and b separately, you can give:

idx_ab = []
for i,j in zip(a,b):
    ab = []
    for x,y in zip(i,j):
        if (isinstance(x, float) and np.isnan(x)) or (isinstance(y, float) and np.isnan(y)):   
            ab.append(False)
        elif x == y:
            ab.append(True)
        else:
            ab.append(False)
    idx_ab.append(ab)
print (idx_ab)

The output of this will be:

[[True, True, False, True], [True, True, True, False]]

If you want the result of np.NaN == np.NaN as True, add this as the first condition followed by the rest:

if (isinstance(x, float) and isinstance(y, float) and all(np.isnan([x,y]))): ab.append(True)

This will result in the above answer as:

[[True, True, False, True], [True, True, True, True]]

The last value is set to True as a[1][3] is np.NaN and b[1][3] is np.NaN.

`arr2 = np.array([1, 2.1, 3.3, np.nan, 2]) arr1 = np.array([1, 2.0, 3.3, np.nan, 3]) arr1 == arr2 array([ True, False, True, False, False]) ` — Karlson, Mar 18 '21 at 22:34
Something is amiss here: `nan == nan` should be false: https://stackoverflow.com/questions/20320022/why-in-numpy-nan-nan-is-false-while-nan-in-nan-is-true — Karlson, Mar 18 '21 at 22:36
Agree. I tried it many times. It shows False without a few values. When I add more values to the array, it is showing True. Strange behavior. Checking it on my other mac to see if there is a different behavior — Joe Ferndz, Mar 18 '21 at 22:37
See my new response. Hopefully it addresses the problem. However, it is not the elegant way to do it. — Joe Ferndz, Mar 19 '21 at 00:41

score 0 · Answer 2 · answered Mar 19 '21 at 00:45

How about just trying isclose (or allclose), and catch the error. Errors that I see below occur early in the isclose code, so there shouldn't be much of a time penalty.

In [129]: for field in a.dtype.names:
     ...:     print(field, a[field], b[field])
     ...:     try:
     ...:         print("1st", np.isclose(a[field],b[field],equal_nan=True))
     ...:     except TypeError as f:
     ...:         print(f)
     ...:         print("2nd",a[field]==b[field])
     ...: 
     ...: 
Date ['2020-01-04' '2020-01-05'] ['2020-01-04' '2020-01-05']
The DTypes <class 'numpy.dtype[float16]'> and <class 'numpy.dtype[datetime64]'> do not have a common DType. For example they cannot be stored in a single array unless the dtype is `object`.
2nd [ True  True]
Name [b'Test' b'Test2'] [b'Test' b'Test2']
ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
2nd [ True  True]
idx [1 2] [2 2]
1st [False  True]
value [ 1. nan] [ 1. nan]
1st [ True  True]

Comparing 2 Structured Arrays that contain values of different types and NaNs

2 Answers2