2

How do I do a one-to-one nearest neighbor lookup using scipy KDTree / cKDTree?

So I have 2 sets, set A and set B, of "pixels", a pixel being a collection of 3 coordinates (x,y,d). Each of the sets is same length. I need to match each pixel of set B with a unique associated pixel of set A, by nearest neighbor along these 3 dims. One-to-one being that each pixel in A has exactly 1 neighbor in B and vice-versa. If you are familiar with GIS tools, there is always a one-to-one option for nearest neighbor tools.

No unvectorized for-loops please, I need to scale this to millions of "pixels". So if I had to repeat the query for each pixel in B after removing the previous nearest neighbor, it couldn't sacrifice performance.

from scipy.spatial import KDTree

xydA = np.array([[ 33.255     , 265.59499898,   1.29999995],
       [ 33.255     , 265.60499898,   1.29999995],
       [ 33.255     , 265.61499898,   1.29999995],
       [ 33.255     , 265.62499898,   1.29999995],
       [ 33.255     , 265.63499898,   1.60000002],
       [ 33.255     , 265.64499898,   1.70000005],
       [ 33.255     , 265.65499898,   1.89999998],
       [ 33.255     , 265.66499898,   1.89999998],
       [ 33.265     , 265.59499898,   1.20000005],
       [ 33.265     , 265.60499898,   1.29999995],
       [ 33.265     , 265.61499898,   1.29999995],
       [ 33.265     , 265.62499898,   1.39999998],
       [ 33.265     , 265.63499898,   1.5       ],
       [ 33.265     , 265.64499898,   1.79999995],
       [ 33.265     , 265.65499898,   1.79999995],
       [ 33.265     , 265.66499898,   1.79999995],
       [ 33.275     , 265.59499898,   1.20000005],
       [ 33.275     , 265.60499898,   1.29999995],
       [ 33.275     , 265.61499898,   1.29999995],
       [ 33.275     , 265.62499898,   1.29999995],
       [ 33.275     , 265.63499898,   1.39999998],
       [ 33.275     , 265.64499898,   1.70000005],
       [ 33.275     , 265.65499898,   1.70000005],
       [ 33.275     , 265.66499898,   1.70000005],
       [ 33.285     , 265.59499898,   1.20000005],
       [ 33.285     , 265.60499898,   1.20000005],
       [ 33.285     , 265.61499898,   1.29999995],
       [ 33.285     , 265.62499898,   1.29999995],
       [ 33.285     , 265.63499898,   1.39999998],
       [ 33.285     , 265.64499898,   1.5       ],
       [ 33.285     , 265.65499898,   1.60000002],
       [ 33.285     , 265.66499898,   1.60000002],
       [ 33.295     , 265.59499898,   1.20000005],
       [ 33.295     , 265.60499898,   1.29999995],
       [ 33.295     , 265.61499898,   1.29999995],
       [ 33.295     , 265.62499898,   1.29999995],
       [ 33.295     , 265.63499898,   1.29999995],
       [ 33.295     , 265.64499898,   1.5       ],
       [ 33.295     , 265.65499898,   1.5       ],
       [ 33.295     , 265.66499898,   1.60000002],
       [ 33.305     , 265.59499898,   1.        ],
       [ 33.305     , 265.60499898,   1.29999995],
       [ 33.305     , 265.61499898,   1.29999995],
       [ 33.305     , 265.62499898,   1.29999995],
       [ 33.305     , 265.63499898,   1.29999995],
       [ 33.305     , 265.64499898,   1.39999998],
       [ 33.305     , 265.65499898,   1.39999998],
       [ 33.305     , 265.66499898,   1.70000005],
       [ 33.315     , 265.59499898,   1.20000005],
       [ 33.315     , 265.60499898,   1.29999995],
       [ 33.315     , 265.61499898,   1.20000005],
       [ 33.315     , 265.62499898,   1.20000005],
       [ 33.315     , 265.63499898,   1.20000005],
       [ 33.315     , 265.64499898,   1.39999998],
       [ 33.315     , 265.65499898,   1.39999998],
       [ 33.315     , 265.66499898,   1.60000002],
       [ 33.325     , 265.59499898,   1.20000005],
       [ 33.325     , 265.60499898,   1.20000005],
       [ 33.325     , 265.61499898,   1.10000002],
       [ 33.325     , 265.62499898,   1.20000005],
       [ 33.325     , 265.63499898,   1.29999995],
       [ 33.325     , 265.64499898,   1.29999995],
       [ 33.325     , 265.65499898,   1.29999995],
       [ 33.325     , 265.66499898,   1.5       ]])

xydB = np.array([[3.32550000e+01, 2.65594999e+02, 1.00000000e+00],
       [3.32550000e+01, 2.65604999e+02, 6.99999988e-01],
       [3.32550000e+01, 2.65614999e+02, 8.00000012e-01],
       [3.32550000e+01, 2.65624999e+02, 8.00000012e-01],
       [3.32550000e+01, 2.65634999e+02, 6.99999988e-01],
       [3.32550000e+01, 2.65644999e+02, 4.00000006e-01],
       [3.32550000e+01, 2.65654999e+02, 2.00000003e-01],
       [3.32550000e+01, 2.65664999e+02, 2.00000003e-01],
       [3.32650000e+01, 2.65594999e+02, 8.99999976e-01],
       [3.32650000e+01, 2.65604999e+02, 8.99999976e-01],
       [3.32650000e+01, 2.65614999e+02, 8.99999976e-01],
       [3.32650000e+01, 2.65624999e+02, 8.99999976e-01],
       [3.32650000e+01, 2.65634999e+02, 8.99999976e-01],
       [3.32650000e+01, 2.65644999e+02, 5.00000000e-01],
       [3.32650000e+01, 2.65654999e+02, 5.00000000e-01],
       [3.32650000e+01, 2.65664999e+02, 5.00000000e-01],
       [3.32750000e+01, 2.65594999e+02, 1.00000000e+00],
       [3.32750000e+01, 2.65604999e+02, 8.99999976e-01],
       [3.32750000e+01, 2.65614999e+02, 8.99999976e-01],
       [3.32750000e+01, 2.65624999e+02, 8.99999976e-01],
       [3.32750000e+01, 2.65634999e+02, 8.99999976e-01],
       [3.32750000e+01, 2.65644999e+02, 8.00000012e-01],
       [3.32750000e+01, 2.65654999e+02, 8.00000012e-01],
       [3.32750000e+01, 2.65664999e+02, 5.00000000e-01],
       [3.32850000e+01, 2.65594999e+02, 8.99999976e-01],
       [3.32850000e+01, 2.65604999e+02, 1.00000000e+00],
       [3.32850000e+01, 2.65614999e+02, 1.00000000e+00],
       [3.32850000e+01, 2.65624999e+02, 1.00000000e+00],
       [3.32850000e+01, 2.65634999e+02, 1.10000002e+00],
       [3.32850000e+01, 2.65644999e+02, 8.00000012e-01],
       [3.32850000e+01, 2.65654999e+02, 8.00000012e-01],
       [3.32850000e+01, 2.65664999e+02, 8.00000012e-01],
       [3.32950000e+01, 2.65594999e+02, 6.00000024e-01],
       [3.32950000e+01, 2.65604999e+02, 8.99999976e-01],
       [3.32950000e+01, 2.65614999e+02, 8.99999976e-01],
       [3.32950000e+01, 2.65624999e+02, 1.00000000e+00],
       [3.32950000e+01, 2.65634999e+02, 1.00000000e+00],
       [3.32950000e+01, 2.65644999e+02, 1.10000002e+00],
       [3.32950000e+01, 2.65654999e+02, 8.99999976e-01],
       [3.32950000e+01, 2.65664999e+02, 8.99999976e-01],
       [3.33050000e+01, 2.65594999e+02, 5.00000000e-01],
       [3.33050000e+01, 2.65604999e+02, 8.99999976e-01],
       [3.33050000e+01, 2.65614999e+02, 8.99999976e-01],
       [3.33050000e+01, 2.65624999e+02, 1.10000002e+00],
       [3.33050000e+01, 2.65634999e+02, 1.10000002e+00],
       [3.33050000e+01, 2.65644999e+02, 1.00000000e+00],
       [3.33050000e+01, 2.65654999e+02, 1.00000000e+00],
       [3.33050000e+01, 2.65664999e+02, 8.00000012e-01],
       [3.33150000e+01, 2.65594999e+02, 6.00000024e-01],
       [3.33150000e+01, 2.65604999e+02, 8.00000012e-01],
       [3.33150000e+01, 2.65614999e+02, 8.99999976e-01],
       [3.33150000e+01, 2.65624999e+02, 1.10000002e+00],
       [3.33150000e+01, 2.65634999e+02, 1.10000002e+00],
       [3.33150000e+01, 2.65644999e+02, 1.00000000e+00],
       [3.33150000e+01, 2.65654999e+02, 1.00000000e+00],
       [3.33150000e+01, 2.65664999e+02, 8.99999976e-01],
       [3.33250000e+01, 2.65594999e+02, 6.00000024e-01],
       [3.33250000e+01, 2.65604999e+02, 6.00000024e-01],
       [3.33250000e+01, 2.65614999e+02, 6.99999988e-01],
       [3.33250000e+01, 2.65624999e+02, 8.99999976e-01],
       [3.33250000e+01, 2.65634999e+02, 1.10000002e+00],
       [3.33250000e+01, 2.65644999e+02, 1.20000005e+00],
       [3.33250000e+01, 2.65654999e+02, 1.10000002e+00],
       [3.33250000e+01, 2.65664999e+02, 1.10000002e+00]])

btree = KDTree(xydA)
dist,p = btree.query(xydB,k=1)

assert len(np.unique(p)) == len(p)
Mathi
  • 312
  • 1
  • 8
  • [Does this solution help?](https://stackoverflow.com/a/15366296/3595907) – DrBwts Apr 19 '22 at 12:32
  • @DrBwts Hm all 3 of these raises an error for me, even if I try with 2 dimensions rather than 3. The answer is from 2013 so np/scipy may have had some changes since – openSourcerer Apr 20 '22 at 16:18
  • OK, removing the "[:, None]"s from the KDTree solution runs without errors. However, there's no way to avoid duplicates without running the whole thing. On the full dataset, I hit memory issues MemoryError: Unable to allocate 155. GiB for an array with shape (144008, 144008) and data type float64 – openSourcerer Apr 20 '22 at 16:37

0 Answers0