This has to be the most asked numpy question (I've answered it myself twice in the last week), but since it can be phrased a million ways:
import numpy as np
import scipy.spatial.distance.cdist as cdist
def withScipy(X,Y): # faster
return np.argmin(cdist(X,Y,'sqeuclidean'),axis=0)
def withoutScipy(X,Y): #slower, using broadcasting
return np.argmin(np.sum((X[None,:,:]-Y[:,None,:])**2,axis=-1), axis=0)
There's also a numpy-only method using einsum
that's faster than my function (but not cdist
) but I don't understand it well enough to explain it.
EDIT += 21 months:
The best way to do this algorithmically though is with KDTree.
from sklearn.neighbors import KDTree
# since the sklearn implementation allows return_distance = False, saving memory
y_tree = KDTree(Y)
y_index_of_closest = y_tree.query(X, k = 1, return_distance = False)
@HansMusgrave has a pretty good speedup for KDTree below.
And for completion's sake, the np.einsum
answer, which I now understand:
np.argmin( # (X - Y) ** 2
np.einsum('ij, ij ->i', X, X)[:, None] + # = X ** 2 \
np.einsum('ij, ij ->i', Y, Y) - # + Y ** 2 \
2 * X.dot(Y.T), # - 2 * X * Y
axis = 1)
@Divakar explains this method well on the wiki page of his package eucl_dist