Efficient way to find the shortest distance between two arrays?

Question

I am trying to find the shortest distance between two sets of arrays. The x- arrays are identical and just contain integers. Here is an example of what I am trying to do:

import numpy as np
x1 = x2 = np.linspace(-1000, 1000, 2001)
y1 = (lambda x, a, b: a*x + b)(x1, 2, 1)
y2 = (lambda x, a, b: a*(x-2)**2 + b)(x2, 2, 10)

def dis(x1, y1, x2, y2):
    return sqrt((y2-y1)**2+(x2-x1)**2)

min_distance = np.inf
for a, b in zip(x1, y1):
    for c, d in zip(x2, y2):
        if dis(a, b, c, d) < min_distance:
            min_distance = dis(a, b, c, d)

>>> min_distance
2.2360679774997898

This solution works, but the problem is runtime. If x has a length of ~10,000, the solution is infeasible because the program ha O(n^2) runtime. Now, I tried making some approximations to speed the program up:

for a, b in zip(x1, y1):
    cut = (x2 > a-20)*(x2 < a+20)
    for c, d in zip(x2, y2):
        if dis(a, b, c, d) < min_distance:
            min_distance = dis(a, b, c, d)

But the program is still taking longer than I'd like. Now, from my understanding, it is generally inefficient to loop through a numpy array, so I'm sure there is still room for improvement. Any ideas on how to speed this program up?

Possible duplicate of [How can the euclidean distance be calculated with numpy?](http://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy) — tmthydvnprt, Jul 29 '16 at 22:55
Possible duplicate of [Efficiently checking Euclidean distance for a large number of objects in Python](http://stackoverflow.com/q/29885212/1461210) — ali_m, Jul 30 '16 at 11:24

score 1 · Answer 1 · answered Jul 29 '16 at 22:58

Your problem could also be represented as 2d collision detection, so a quadtree might help. Insertion and querying both run in O(log n) time, so the whole search would run in O(n log n).

One more suggestion, since sqrt is monotonic, you can compare the squares of distances instead of the distances themselves, which will save you n^2 square root calculations.

Mahdi · Answer 2 · 2016-07-30T05:50:58.613

scipy has a cdist function which calculates distance between all pairs of points:

from scipy.spatial.distance import cdist
import numpy as np

x1 = x2 = np.linspace(-1000, 1000, 2001)
y1 = (lambda x, a, b: a*x + b)(x1, 2, 1)
y2 = (lambda x, a, b: a*(x-2)**2 + b)(x2, 2, 10)

R1 = np.vstack((x1,y1)).T
R2 = np.vstack((x2,y2)).T

dists = cdist(R1,R2) # find all mutual distances

print (dists.min())
# output: 2.2360679774997898

This runs more than 250 times faster than the original for loop.

score 0 · Answer 3 · answered Jul 29 '16 at 22:51

0

This is a difficult problem, and it may help if you are willing to accept approximations. I would check out something like Spottify's annoy.

answered Jul 29 '16 at 22:51

homer

238
3
9

Efficient way to find the shortest distance between two arrays?

3 Answers3