I would like to calculate distance between each horizontal vector in square matrix X and each horizontal vector in square matrix Y.
import numpy as np
from tqdm import tqdm
def euclidean_dist(x, y) -> float:
return np.linalg.norm(x - y)
def dist(X, Y):
def calc(y):
def calc2(x):
return euclidean_dist(x, y)
return calc2
distances = [np.apply_along_axis(calc(y), 1, X) for y in tqdm(Y)]
return np.asarray(distances)
While for small matrices it works fine, for large matrices it's terribly slow. For instance, for matrices of size 14000 tqdm has estimated time of 2h.
size = 14000
X = np.random.rand(size,size)
Y = np.random.rand(size,size)
D = dist(X, Y)
How can I make it more optimal?