Compute numpy array pairwise Euclidean distance except with self

Question

edit: this question is not specifically about calculating distances, rather the most efficient way to loop through a numpy array, specifying that for index i all comparisons should be made with the rest of the array, as long as the second index is not i.

I have a numpy array with columns (X, Y, ID) and want to compare each element to each other element, but not itself. So, for each X, Y coordinate, I want to calculate the distance to each other X, Y coordinate, but not itself (where distance = 0).

Here is what I have - there must be a more "numpy" way to write this.

import math, arcpy
# Point feature class
fc = "MY_FEATURE_CLASS"
# Load points to numpy array: (X, Y, ID)
npArray = arcpy.da.FeatureClassToNumPyArray(fc,["SHAPE@X","SHAPE@Y","OID@"])
for row in npArray:
    for row2 in npArray:
        if row[2] != row2[2]:
            # Pythagoras's theorem
            distance = math.sqrt(math.pow((row[0]-row2[0]),2)+math.pow((row[1]-row2[1]),2))

Obviously, I'm a numpy newbie. I will not be surprised to find this a duplicate, but I don't have the numpy vocabulary to search out the answer. Any help appreciated!

You may want to look at [`scipy.spatial.distance.pdist`](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html). — user2357112, Mar 18 '15 at 20:57
possible duplicate of [Computing Euclidean distance for numpy in python](http://stackoverflow.com/questions/28687321/computing-euclidean-distance-for-numpy-in-python) — Oliver W., Mar 18 '15 at 21:39
Note that in many geometries d(x, y) = 0 iff x = y, so you may want to skip that check, and deal with the zeros later on. Disregard this comment if you're dealing computing distance under a non-injective mapping - I just noticed you've discussing "features" in your code, which suggests this may be the case. — Patrick McLaren, Mar 18 '15 at 22:19
@PatrickMcLaren That may be. The main thrust behind the question is to find the nearest other geometry to a given geometry. If I leave in the zeros, then I'm looking for the second-nearest geometry to each geometry, which seems more complicated. — phloem7, Mar 18 '15 at 22:25
Well, if you are willing to use scipy, then the comment by @user2357112 should do the trick. If not, you could use a list comprehension, say `distances = [dist(x, y) for x in npArray for y in npArray if x != y]`, where `dist` is some metric. — Patrick McLaren, Mar 18 '15 at 22:36
In terms of math you could build a 3d tensor with sizes NxNx2 from two matrices of sized Nx2, by so-called `broadcasting` Nx2 matrix up to NxNx2 tensor and getting it's transposed version and subtracting them and then computing L2-norm over third axis. Than, I guess, you'd have exactly matrix of pair-wise distances. — Ben Usman, Mar 19 '15 at 00:00
@user2312518 I've been [using cKDTree](http://stackoverflow.com/a/25551131/832621) that comes in SciPy to do this kind of task very efficiently, it seems much faster than calculating all the distances — Saullo G. P. Castro, Mar 19 '15 at 00:35

score 0 · Answer 1 · answered Jun 14 '19 at 20:51

Using SciPy's pdist, you could write something like

from scipy.spatial.distance import pdist, squareform

distances = squareform(pdist(npArray, lambda a,b: np.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)))

pdist will compute the pair-wise distances using the custom metric that ignores the 3rd coordinate (which is your ID in this case). squareform turns this into a more readable matrix such that distances[0,1] gives the distance between the 0th and 1st rows.

score 0 · Answer 2 · edited Dec 29 '19 at 16:03

0

Each row of X is a 3 dimensional data instance or point. The output pairwisedist[i, j] is distance of X[i, :] and X[j, :]

X = np.array([[6,1,7],[10,9,4],[13,9,3],[10,8,15],[14,4,1]])
a = np.sum(X*X,1)
b = np.repeat( a[:,np.newaxis],5,axis=1)
pairwisedist = b + b.T -2* X.dot(X.T)

edited Dec 29 '19 at 16:03

Andronicus

25,419
17
47
88

answered Dec 29 '19 at 15:40

Mohammad R. Moosavi

1

score 0 · Answer 3 · answered Sep 18 '21 at 11:32

I wanted to point out that custom written sqrt of sum of squares are prone to overflow and underflow. Bultin math.hypot, np.hypot are way safer for no compromise on performance

from scipy.spatial.distance import pdist, squareform

distances = squareform(pdist(npArray, lambda a,b: math.hypot(*(a-b))

Refer

Compute numpy array pairwise Euclidean distance except with self

3 Answers3