To normalize the rows of a matrix X
to unit length, I usually use:
X /= np.linalg.norm(X, axis=1, keepdims=True)
Trying to optimize this operation for an algorithm, I was quite surprised to see that writing out the normalization is about 40% faster on my machine:
X /= np.sqrt(X[:,0]**2+X[:,1]**2+X[:,2]**2)[:,np.newaxis]
X /= np.sqrt(sum(X[:,i]**2 for i in range(X.shape[1])))[:,np.newaxis]
How comes? Where is the performance lost in np.linalg.norm()
?
import numpy as np
X = np.random.randn(10000,3)
%timeit X/np.linalg.norm(X,axis=1, keepdims=True)
# 276 µs ± 4.55 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit X/np.sqrt(X[:,0]**2+X[:,1]**2+X[:,2]**2)[:,np.newaxis]
# 169 µs ± 1.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit X/np.sqrt(sum(X[:,i]**2 for i in range(X.shape[1])))[:,np.newaxis]
# 185 µs ± 4.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
I observe this for (1) python3.6 + numpy v1.17.2
and (2) python3.9 + numpy v1.19.3
on a MacbookPro 2015 with OpenBLAS support.
I don't think this is a duplicate of this post, which addresses matrix norms, while this one is about the L2-norm of vectors.