3

I have 2 sets of 2D points (A and B), each set have about 540 points. I need to find the points in set B that are farther than a defined distance alpha from all the points in A.

I have a solution, but is not fast enough

# find the closest point of each of the new point to the target set
def find_closest_point( self, A, B):
    outliers = []
    for i in range(len(B)):
        # find all the euclidean distances
        temp = distance.cdist([B[i]],A)
        minimum = numpy.min(temp)
        # if point is too far away from the rest is consider outlier
        if minimum > self.alpha :
            outliers.append([i, B[i]])
        else:
            continue
    return outliers

I am using python 2.7 with numpy and scipy. Is there another way to do this that I may gain a considerable increase in speed?

Thanks in advance for the answers

api55
  • 11,070
  • 4
  • 41
  • 57
  • possible duplicate of [Euclidean distance between points in two different Numpy arrays, not within](http://stackoverflow.com/questions/1871536/euclidean-distance-between-points-in-two-different-numpy-arrays-not-within) or [calculate euclidean distance with numpy](http://stackoverflow.com/questions/1401712/calculate-euclidean-distance-with-numpy) – Inbar Rose Oct 20 '13 at 09:21
  • 1
    Since you seem to look for outliers only, which is a nearest neighbour search. I think you can (and should) use `scipy.spatial.KDTree` (the old cKDTree probably does not support that, so use a newer scipy where cKDTree does). – seberg Oct 20 '13 at 10:09
  • also: http://stackoverflow.com/questions/19277244/fast-weighted-euclidean-distance-between-points-in-arrays/19285289#19285289 – ali_m Oct 20 '13 at 11:07

2 Answers2

5
>>> from scipy.spatial.distance import cdist
>>> A = np.random.randn(540, 2)
>>> B = np.random.randn(540, 2)
>>> alpha = 1.
>>> ind = np.all(cdist(A, B) > alpha, axis=0)
>>> outliers = B[ind]

gives you the points you want.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • Actually this gives me a set of True and False, though I think that I got an idea of what you wanted to do, and actually is a huge speed improvement. – api55 Oct 20 '13 at 09:58
  • now it works fine, and i got a big increase in speed aprox. from 9.5 to 0.6 for 10 tries of the whole algorithm, thank you – api55 Oct 20 '13 at 10:20
0

If you have a very large set of points you could calculate x & y bounds of a add & subtract aplha then eliminate all the points in b from specific consideration that lay outside of that boundary.

Steve Barnes
  • 27,618
  • 6
  • 63
  • 73