How to get nearest neighbor for every element of array A from array B

Question

I need to create function nearest_neighbor(src, dst), which accepts two arrays of 2D points, and for every point of array A calculates distance and index to closest neighbor from array B.

Example input:

src = np.array([[1,1], [2,2],[3,3],[4,4],[9,9]])
dst = np.array([[6,7],[10,10],[10,20]])

Example output:

(array([7.81024968, 6.40312424, 5.        , 3.60555128, 1.41421356]),
 array([0, 0, 0, 0, 1]))

With sklearn you can do it like this:

def nearest_neighbor(src, dst):
    neigh = NearestNeighbors(n_neighbors=1)
    neigh.fit(dst)
    distances, indices = neigh.kneighbors(src, return_distance=True)
    return distances.ravel(), indices.ravel()

But i need to create it only with numpy. I made it like this:

def nearest_neighbor(src, dst):
    distances = []
    indices = []
    
    
    for dot in src:
        dists = np.linalg.norm(dst - dot,axis=1)
        dist = np.min(dists)
        idx = np.argmin(dists)
        
        distances.append(dist)
        indices.append(idx)

    return np.array(distances), np.array(indices)

But it works slow because of python cycle. How I can make it faster?

mozway · Answer 1 · 2022-06-14T13:54:22.320

2

You can use scipy.spatial.distance.cdist:

from scipy.spatial.distance import cdist

# compute matrix of distances
dist = cdist(src, dst)

# get min distance
closest = dist.argmin(axis=1)
# array([0, 0, 0, 0, 1])

distance = dist[np.arange(src.shape[0]), closest]
#array([7.81024968, 6.40312424, 5.        , 3.60555128, 1.41421356])

edited Jun 14 '22 at 13:54

answered Jun 14 '22 at 13:48

mozway

194,879
13
39
75

1

I think `cdist` will be the fastest method among the proposed answers, which may could be faster if utilize a equivalent numba ([similar case](https://stackoverflow.com/a/72566612/13394817)) code instead. – Ali_Sh Jun 14 '22 at 14:25

score 1 · Answer 2 · answered Jun 14 '22 at 13:49

1

You should read on numpy broadcasting:

dist = np.square(src[:,None] - dst).sum(axis=-1) ** .5

idx = dist.argmin(axis=-1)
# array([0, 0, 0, 0, 1])

min_dist = dist[np.arange(len(dist)), idx]

answered Jun 14 '22 at 13:49

Quang Hoang

146,074
10
56
74

Note that, if you compute the distance manually, you can take the square root only after subsetting the min distance , this saves you some CPU ;) – mozway Jun 14 '22 at 14:07

score 0 · Accepted Answer · answered Jun 14 '22 at 13:50

Using broadcast, src[:, None] - dst make each row of src subtract each row of dst:

>>> def nearest_neighbor(src, dst):
...     dist = np.linalg.norm(src[:, None] - dst, axis=-1)
...     indices = dist.argmin(-1)
...     return dist[np.arange(len(dist)), indices], indices
...
>>> src = np.array([[1,1], [2,2],[3,3],[4,4],[9,9]])
>>> dst = np.array([[6,7],[10,10],[10,20]])
>>> nearest_neighbor(src, dst)
(array([7.81024968, 6.40312424, 5.        , 3.60555128, 1.41421356]),
 array([0, 0, 0, 0, 1], dtype=int64))

How to get nearest neighbor for every element of array A from array B

3 Answers3