The documentation of scipy.spatial.distance.euclidean
states, that only 1D-vectors are allowed as inputs. Thus you must loop over your arrays like:
distances = np.empty(b.shape[0])
for i in range(b.shape[0]):
distances[i] = scipy.spatial.distance.euclidean(a, b[i])
If you want to have a vectorized implementation, you need to write your own function. Perhaps using np.vectorize
with a correct signature will also work, but this is in fact also just a short-hand for a for-loop and will thus have the same performance as a simple for-loop.
As stated in my comment to hannes wittingham's solution, I'll post a one-liner which is focussing on performance:
distances = ((b - a)**2).sum(axis=1)**0.5
Writing out all the calculations reduces the number of separate functions calls and thus assignments of the intermediate results to new arrays. Thus it is about 22% faster than using the solution of hannes wittingham for an array shape of b.shape == (20, 3)
and about 5% faster for an array shape of
b.shape == (20000, 3)
:
a = np.array([1, 1, 1,])
b = np.random.rand(20, 3)
%timeit ((b - a)**2).sum(axis=1)**0.5
# 5.37 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit euclidean_distances(a, b)
# 6.89 µs ± 345 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
b = np.random.rand(20000, 3)
%timeit ((b - a)**2).sum(axis=1)**0.5
# 588 µs ± 43.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit euclidean_distances(a, b)
# 616 µs ± 36.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
But your are losing the flexibility of being able to easily change to distance calculation routine. When using the scipy.spatial.distance
module, you can change the calculation routing by simply calling another method.
To improve the calculation performance even further, you can use a jit (just in time) compiler like numba
for your functions:
import numba as nb
@nb.njit
def euc(a, b):
return ((b - a)**2).sum(axis=1)**0.5
This reduces the time needed to do the calculations by about 70% for small arrays and by about 60% for large arrays. Unluckily the axis
keyword for np.linalg.norm
is not yet supported by numba
.