How to compare two numpy.array based on numerical part and non-numerical part respectively?

Question

import numpy as np

a = np.array([[1., 2., 3.], 
              [4., 5., 'a']], dtype=object)
b = np.array([[1.00000001, 2., 3.], 
              [4., 5., 'a']], dtype=object)
print(a == b)

actual output:

[[False  True  True]
 [ True  True  True]]

expected output (since 1.00000001 is close enough to 1):

[[True  True  True]
 [True  True  True]]

I cannot use numpy.isclose() because there is non-numerical part in the array.

You could strip off the string and convert to float: `a.ravel()[:-1].astype(float)`. `isclose` uses the difference, and also checks for floats like `inf`, which is why it needs the numeric dtype. Or you could leave the values `object` dtype, and do your own test of the differences `a.ravel()[:-1]` — hpaulj, Jan 06 '22 at 04:15
The non-numerical part can be in any place in the array, not only in the last place. — 吴慈霆, Jan 06 '22 at 04:53

score 0 · Answer 1 · answered Jan 06 '22 at 10:59

It's a bit hacky, but you could use masks to solve it. For example, using the is_numeric_3 function from this answer (which I'll report here, too, for completeness):

def is_float(val):
        try:
            float(val)
        except ValueError:
            return False
        else:
            return True

is_numeric_3 = np.vectorize(is_float, otypes = [bool]) # return numpy array

mask_a = is_numeric_3(a)
mask_b = is_numeric_3(b)
mask = mask_a & mask_b

result = a == b
result[mask] = np.isclose(a[mask], b[mask])

How to compare two numpy.array based on numerical part and non-numerical part respectively?

1 Answers1