0
from scipy.spatial.distance import cdist
from sklearn.datasets import make_moons

X, y = make_moons()
cdist(X,X).min(axis=1)

gave me

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

This is not what I would like. I would like the minimum distance between all the points in X where i is not equal to j. Of course, if i=j, then I would get 0. How can I do that using cdist?

Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72
David
  • 393
  • 3
  • 4
  • 11
  • 2
    You basically want to ignore the diagonal of the distance matrix, right? This might not be the most elegant solution, but by adding `inf` (or any other sufficiently large number) to the diagonal, `.min(axis=1)` should work. `(cdist(X, X) + np.diag([np.inf]*len(X))).min(axis=1)` – Niklas Mertsch Oct 25 '20 at 23:46

1 Answers1

1

cdist is an overkill to calculate pairwise distances for an array. For an array the upper triangle is minimal meaningful representation of all possible distances not including 0 distances to itself. The way to do it is using pdist:

from scipy.spatial.distance import pdist
from sklearn.datasets import make_moons

X, y = make_moons()
# desired output
pdist(X).min()

It returns an upper triange ndarray which is:

Y: ndarray Returns a condensed distance matrix Y. For each i and j (where i<j<m),where m is the number of original observations. The metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

You may read more about condensed matrix here

Time comparison:

%timeit pdist(X)
73 µs ± 825 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit cdist(X,X)
112 µs ± 315 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Sergey Bushmanov
  • 23,310
  • 7
  • 53
  • 72