0

I am trying to find the shortest distance between two sets of arrays. The x- arrays are identical and just contain integers. Here is an example of what I am trying to do:

import numpy as np
x1 = x2 = np.linspace(-1000, 1000, 2001)
y1 = (lambda x, a, b: a*x + b)(x1, 2, 1)
y2 = (lambda x, a, b: a*(x-2)**2 + b)(x2, 2, 10)

def dis(x1, y1, x2, y2):
    return sqrt((y2-y1)**2+(x2-x1)**2)

min_distance = np.inf
for a, b in zip(x1, y1):
    for c, d in zip(x2, y2):
        if dis(a, b, c, d) < min_distance:
            min_distance = dis(a, b, c, d)

>>> min_distance
2.2360679774997898

This solution works, but the problem is runtime. If x has a length of ~10,000, the solution is infeasible because the program ha O(n^2) runtime. Now, I tried making some approximations to speed the program up:

for a, b in zip(x1, y1):
    cut = (x2 > a-20)*(x2 < a+20)
    for c, d in zip(x2, y2):
        if dis(a, b, c, d) < min_distance:
            min_distance = dis(a, b, c, d)

But the program is still taking longer than I'd like. Now, from my understanding, it is generally inefficient to loop through a numpy array, so I'm sure there is still room for improvement. Any ideas on how to speed this program up?

Josh Tik
  • 1
  • 1
  • Possible duplicate of [How can the euclidean distance be calculated with numpy?](http://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy) – tmthydvnprt Jul 29 '16 at 22:55
  • Possible duplicate of [Efficiently checking Euclidean distance for a large number of objects in Python](http://stackoverflow.com/q/29885212/1461210) – ali_m Jul 30 '16 at 11:24

3 Answers3

1

Your problem could also be represented as 2d collision detection, so a quadtree might help. Insertion and querying both run in O(log n) time, so the whole search would run in O(n log n).

One more suggestion, since sqrt is monotonic, you can compare the squares of distances instead of the distances themselves, which will save you n^2 square root calculations.

c2huc2hu
  • 2,447
  • 17
  • 26
1

scipy has a cdist function which calculates distance between all pairs of points:

from scipy.spatial.distance import cdist
import numpy as np

x1 = x2 = np.linspace(-1000, 1000, 2001)
y1 = (lambda x, a, b: a*x + b)(x1, 2, 1)
y2 = (lambda x, a, b: a*(x-2)**2 + b)(x2, 2, 10)

R1 = np.vstack((x1,y1)).T
R2 = np.vstack((x2,y2)).T

dists = cdist(R1,R2) # find all mutual distances

print (dists.min())
# output: 2.2360679774997898

This runs more than 250 times faster than the original for loop.

Mahdi
  • 3,188
  • 2
  • 20
  • 33
0

This is a difficult problem, and it may help if you are willing to accept approximations. I would check out something like Spottify's annoy.

homer
  • 238
  • 3
  • 9