0

I have two Numpy arrays of (x,y) coordinates. I want to find all points in the first array that are NOT in the second array. The coordinates are floating-point numbers. They should have few digits (e.g. 1.25, but not 1.123456). But they're the result of calculations, so floating-point imprecision is a factor.

Comments to this question state that an answer suitable to floating-point numbers is found here. But after inspecting the answers, it's not clear to me that any of them account for floating-point imprecision.

Right my solution is this:

import numpy as np

a1 = np.array([[1.2, 2.3], [1.0, 1.1]])
a2 = np.array([[1.0, 1.1], [5.2, 2.2]])

a1_not_a2 = []
a2_set = set(tuple(point) for point in a2.round(decimals=5).tolist())
for point in a1.round(decimals=5).tolist():
    if tuple(point) not in a2_set:
        a1_not_a2.append(point)

But I'm not sure if my solution always works, and it's slow. I have two questions:

(1) Is comparing floats after round(decimals=5) guaranteed to produce correct output?

(2) Is there a better way to get my result? My arrays are huge, so using nested for loops with np.allclose is slow.

Danny
  • 35
  • 4
  • Make a map of sets based on rounding to 2 digits (or similar, so that you get a subdivision of 10 to 100 in each direction). Then the test of the smaller sets should be faster. – Lutz Lehmann Apr 24 '23 at 07:35

0 Answers0