cKDTree vs dsearchn

Question

I have two arrays (A,B) containing: ID, x, y, z of the same number of points but slightly differents. I would like to have an array in where each row has the ID x y z of the two nearest points of the two arrays. At the moment I have this:

import numpy as np
from scipy.spatial import cKDTree
A = np.loadtxt('A.txt')
B = np.loadtxt('B.txt')
tree = cKDTree( B[:,[1,2,3]] )
d, inds = tree.query( A[:,[1,2,3]], k=1, p=np.inf, eps=0.0)
A_new = A[inds]
xyz_near = np.hstack(( B[:,0:4], A_new[:,0:4] ))

But the array xyz_near does not contain the right couple (IDB xB yB zB DIA xA yA zA):

12587 18.0445 0.0784 -7.4705 3422 18.0444 0.0403 -7.4639

12588 18.0447 0.0783 -7.2231 3421 18.0446 0.0402 -7.2164

12589 18.0450 0.0781 -6.9756 7764 18.0461 0.0400 -5.9785

12590 18.0452 0.0779 -6.7281 7765 18.0464 0.0399 -5.7310

12591 18.0454 0.0777 -6.4805 7766 18.0467 0.0399 -5.4835

12592 18.0457 0.0775 -6.2329 7767 18.0470 0.0398 -5.2359

12593 18.0459 0.0773 -5.9852 7768 18.0473 0.0398 -4.9884

As you can see the first two rows are right but not the next.

If I do the same thing in matlab with dsearchn (IDB xB yB zB DIA xA yA zA):

12587 18.0445 0.0784 -7.4705 3422 18.0444 0.0403 -7.4639

12588 18.0447 0.0783 -7.2231 3421 18.0446 0.0402 -7.2164

12589 18.0450 0.0781 -6.9756 3420 18.0448 0.0402 -6.9688

12590 18.0452 0.0779 -6.7281 3419 18.0450 0.0401 -6.7212

12591 18.0454 0.0777 -6.4805 3418 18.0453 0.0401 -6.4737

12592 18.0457 0.0775 -6.2329 3417 18.0455 0.0400 -6.2261

12593 18.0459 0.0773 -5.9852 3416 18.0458 0.0400 -5.9785

that is right. I have tried to change p to 1, 2 and np.inf but this gives the same result.

Files:

A.txt: http://pasted.co/8c5b6156

B.txt: http://pasted.co/28a228e6

Thanks

UPDATE: Even with the fix suggested by ergo_ I got:

12587 18.0445 0.0784 -7.4705 7758 18.0448 0.0403 -7.4639

12587 18.0445 0.0784 -7.4705 3422 18.0444 0.0403 -7.4639

12588 18.0447 0.0783 -7.2231 3421 18.0446 0.0402 -7.2164

12588 18.0447 0.0783 -7.2231 7759 18.0450 0.0402 -7.2163

12589 18.0450 0.0781 -6.9756 7760 18.0452 0.0402 -6.9688

12589 18.0450 0.0781 -6.9756 3420 18.0448 0.0402 -6.9688

12590 18.0452 0.0779 -6.7281 3419 18.0450 0.0401 -6.7212

12590 18.0452 0.0779 -6.7281 7761 18.0454 0.0401 -6.7212

12591 18.0454 0.0777 -6.4805 7762 18.0456 0.0401 -6.4736

12591 18.0454 0.0777 -6.4805 3418 18.0453 0.0401 -6.4737

So it considers multiple times the same points.

Copypaste from #scipy channel: ` instead of A_new = A[inds] you should be doing B_new = B[inds]` — pv., Jun 30 '15 at 18:18
Unfortunally even with B_new = B[inds] I have problems see the update. — yellowhat, Jul 01 '15 at 13:34
The answer you get is "for each point in A, which point in B is the closest", which naturally does not necessary list all points in B. If you want to ask the converse question, interchange A and B. — pv., Jul 01 '15 at 13:39
But I don't undestand why with 'dsearchn' I don't get this problem. Even if I swap A and B not I get multiple references for A. — yellowhat, Jul 01 '15 at 13:52
Using nearest_neighbors_kd_tree from http://stackoverflow.com/questions/15363419/finding-nearest-items-across-two-lists-arrays-in-python/15366296#15366296 seems to work. — yellowhat, Jul 01 '15 at 14:20
Very unclear at this point now what (i) your new Python code is, and (ii) what your Matlab code is, (iii) what do you actually want to do. Do you expect there to be 1-to-1 mapping? dsearchn does not guarantee that either, maybe you used different p-norm? `nearest_neighbors_kd_tree` answers a different question. — pv., Jul 01 '15 at 14:24
Basically I am doing the same thing with python and matlab. The plots that I get after this research of the nearest have more sense with matlab than with python. That's why I think that matlab is the reference. — yellowhat, Jul 01 '15 at 14:31

score 1 · Accepted Answer · edited May 23 '17 at 12:14

You can verify cKDTree gives the correct results. Here, for the question "for each point in A, which point in B is the closest":

import numpy as np
from scipy.spatial import cKDTree
A = np.loadtxt('A.txt')
B = np.loadtxt('B.txt')
tree = cKDTree( B[:,[1,2,3]] )
d, inds = tree.query( A[:,[1,2,3]], k=1, p=2)
B_new = B[inds]
xyz_near = np.hstack(( B_new[:,0:4], A[:,0:4] ))

for j, a in enumerate(A):
    # compute 2-norms from each point in B to a
    dd = np.sqrt(((a[1:] - B[:,1:])**2).sum(axis=1))
    # find closest point
    jx = np.argmin(dd)
    # check solution
    assert inds[j] == jx
    assert np.allclose(d[j], dd.min())
    # check it is unique
    assert (dd[jx+1:] > d[j]).all()
    assert (dd[:jx] > d[j]).all()

print("All OK")

The solution is also unique, as shown above.

If on the other hand you want 1-to-1 mapping, which is a different question, that is given in finding nearest items across two lists/arrays in Python However, I do not think dsearchn gives you this answer.

score 0 · Answer 2 · answered Feb 07 '19 at 07:59

Using dsearchn of Octave or Matlab without triangulation could be lead into this lines of numpy / python code:

def dsearchn(x,y):
"""
Implement Octave / Matlab dsearchn without triangulation
:param x: Search Points in
:param y: Were points are stored
:return: indices of points of x which have minimal distance to points of y
"""
IDX = []
for line in range(y.shape[0]):
    distances = np.sqrt(np.sum(np.power(x - y[line, :], 2), axis=1))
    found_min_dist_ind = (np.min(distances, axis=0) == distances)
    length = found_min_dist_ind.shape[0]
    IDX.append(np.array(range(length))[found_min_dist_ind][0])
return np.array(IDX)

ogidog · Answer 3 · 2020-04-14T10:37:51.900

Try this code. It produces the same result as the MATLAB method "dsearchn(P,T,PQ)" with triangulation.

# xy=[[x1,y1]...[xm,ym]]
# XY=[[X1,Y1]...[Xm,Ym]]

tree = cKDTree(xy[:, 1:])
dd, ii = tree.query(XY, k=2, p=2, eps=0.0)
Z = []
for i in range(len(dd)):
    min_dd = min(dd[i])
    min_dd_idx = np.where(dd[i] == min_dd)[0]
    if len(min_dd_idx) > 1:
        sorted_ii = np.sort(ii[i][min_dd_idx])
        Z.append(sorted_ii[len(min_dd_idx) - 1])
    else:
        Z.append(ii[i][0])

cKDTree vs dsearchn

3 Answers3