1

Very in advance apologies for my basic question!

Given:

a = np.random.rand(6, 3)
b = np.random.rand(6, 3)

Using scipy.spatial.distance.cdist and d = cdist(a, b, 'euclidean'), results in:

[[0.8625803  0.29814357 0.97548993 0.84368212 0.66530478 0.95367553]
 [0.67858887 0.27603821 0.76236585 0.80857596 0.48560167 0.84517836]
 [0.53097997 0.41061975 0.66475479 0.54243987 0.47469843 0.70178229]
 [0.37678898 0.7855905  0.25492161 0.79870147 0.37795642 0.58136674]
 [0.73515058 0.90614048 0.88997676 0.15126486 0.82601188 0.63733843]
 [0.34345477 0.7927319  0.52963369 0.27127254 0.64808932 0.66528862]]

But d = np.linalg.norm(a - b, axis=1), returns only the diagonal of scipy answer:

[0.8625803  0.27603821 0.66475479 0.79870147 0.82601188 0.66528862]

Question:

Is it possible to get the result of scipy.spatial.distance.cdist using only np.linalg.norm or numpy?

Farid Alijani
  • 839
  • 1
  • 7
  • 25
  • 1
    I think `cdist` is probably the best way to calculate this. Unless you have a specific reason for wanting to use numpy? – Josmoor98 Mar 29 '20 at 15:28
  • You're right! But `scipy` does not have GPU implementation and I figured that `cupy` has `linalg.norm` to calculate huge vectors / matrices distances! – Farid Alijani Mar 29 '20 at 15:49
  • Might also be worth checking out [Fastest numba code for computing euclideandistance between a vector and every row of a matrix](https://numba.discourse.group/t/fastest-numba-code-for-computing-euclideandistance-between-a-vector-and-every-row-of-a-matrix/190/12?u=sgbaird) and [How to calculate pairwise distance matrix on the GPU](https://stackoverflow.com/questions/46655878/how-to-calculate-pairwise-distance-matrix-on-the-gpu) – Sterling Aug 18 '21 at 08:08

1 Answers1

3

You can use numpy broadcasting as follows:

d = np.linalg.norm(a[:, None, :] - b[None, :,  :], axis=2)

Performace should be similar to scipy.spatial.distance.cdist, in my local machine:

%timeit np.linalg.norm(a[:, None, :] - b[None, :,  :], axis=2)
13.5 µs ± 1.71 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit cdist(a,b)
15 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37