A norm is a function that takes a vector as an input and returns a scalar value that can be interpreted as the "size", "length" or "magnitude" of that vector. More formally, norms are defined as having the following mathematical properties:
- They scale multiplicatively, i.e. Norm(a·v) = |a|·Norm(v) for any scalar a
- They satisfy the triangle inequality, i.e. Norm(u + v) ≤ Norm(u) + Norm(v)
- The norm of a vector is zero if and only if it is the zero vector, i.e. Norm(v) = 0 ⇔ v = 0
The Euclidean norm (also known as the L² norm) is just one of many different norms - there is also the max norm, the Manhattan norm etc. The L² norm of a single vector is equivalent to the Euclidean distance from that point to the origin, and the L² norm of the difference between two vectors is equivalent to the Euclidean distance between the two points.
As @nobar's answer says, np.linalg.norm(x - y, ord=2)
(or just np.linalg.norm(x - y)
) will give you Euclidean distance between the vectors x
and y
.
Since you want to compute the Euclidean distance between a[1, :]
and every other row in a
, you could do this a lot faster by eliminating the for
loop and broadcasting over the rows of a
:
dist = np.linalg.norm(a[1:2] - a, axis=1)
It's also easy to compute the Euclidean distance yourself using broadcasting:
dist = np.sqrt(((a[1:2] - a) ** 2).sum(1))
The fastest method is probably scipy.spatial.distance.cdist
:
from scipy.spatial.distance import cdist
dist = cdist(a[1:2], a)[0]
Some timings for a (1000, 1000) array:
a = np.random.randn(1000, 1000)
%timeit np.linalg.norm(a[1:2] - a, axis=1)
# 100 loops, best of 3: 5.43 ms per loop
%timeit np.sqrt(((a[1:2] - a) ** 2).sum(1))
# 100 loops, best of 3: 5.5 ms per loop
%timeit cdist(a[1:2], a)[0]
# 1000 loops, best of 3: 1.38 ms per loop
# check that all 3 methods return the same result
d1 = np.linalg.norm(a[1:2] - a, axis=1)
d2 = np.sqrt(((a[1:2] - a) ** 2).sum(1))
d3 = cdist(a[1:2], a)[0]
assert np.allclose(d1, d2) and np.allclose(d1, d3)