3

I have this numpy array with points, something like

[(x1,y1), (x2,y2), (x3,y3), (x4,y4), (x5,y5)]

What I would like to do, is to get an array of all minimum distances. So for point 1 (x1, y1), I want the distance of the point closest to it, same for point 2 (x2,y2), etc... Distance being sqrt((x1-x2)^2 + (y1-y2)^2).

This will obviously be an array with the same length as my array with point (in this case: 5 points -> 5 minimum distances).

Any concise way of doing this without resorting to loops?

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
Rudy
  • 43
  • 1
  • 5
  • 1
    How much does performance matter? – cel Jun 11 '15 at 16:58
  • http://stackoverflow.com/q/10818546/2823755 looks promising – wwii Jun 11 '15 at 17:18
  • Or http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.spatial.distance.cdist.html – wwii Jun 11 '15 at 17:25
  • scipy.spatial.distance.cdist that is – wwii Jun 11 '15 at 17:31
  • What is the equivalent pure numpy solution is scipy can't be loaded for various reasons? I have been using the norm approach but I have nothing to compare times to see whether pdist or cdist provide a really drastic improvement in process speeds considering point sets < 1000 or so –  Jun 11 '15 at 22:06
  • performance doesn't matter much, since this is a calculation done before the action starts. What you suggested, cel, does the trick nicely. Thank you. – Rudy Jun 12 '15 at 10:01

1 Answers1

9

This solution really focuses on readability over performance - It explicitly calculates and stores the whole n x n distance matrix and therefore cannot be considered efficient.

But: It is very concise and readable.

import numpy as np
from scipy.spatial.distance import pdist, squareform

#create n x d matrix (n=observations, d=dimensions)
A = np.array([[1,9,2,4], [1,2,3,1]]).T

# explicitly calculate the whole n x n distance matrix
dist_mat = squareform(pdist(A, metric="euclidean"))

# mask the diagonal
np.fill_diagonal(dist_mat, np.nan)

# and calculate the minimum of each row (or column)
np.nanmin(dist_mat, axis=1)
cel
  • 30,017
  • 18
  • 97
  • 117