I have a very large matrix of size (10000000, 250)
. I want to calculate the euclidean distance between each column of the matrix.
If I try to use the scipy.spatial.distance
functions, it tries to create a matrix obviously of size (10000000, 10000000)
.
To bypass that, I decided to write my own function, but it takes forever to run. Any way to optimize this would be highly appreciated:
import numpy as np
import scipy.spatial as scsp
def calcDist(mat):
dis= []
for i in range(mat.shape[0]):
for j in range(i):
dis.append(scsp.distance.euclidean(mat[i,:], mat[j,:]))
return np.array(dis)
Q = np.random.random([10000000, 250])
distQ = calcDist(Q)