I have to calculate the pairwise weighted distance of a potentially large number of vectors. Let A be the m vectors/n atribute matrix, W is a 1xn weight vector.
My first function calculates the pairwise distance the naive way:
def weightedDistance(A,W):
aff = np.zeros((len(A),len(A)))
for i in range(len(A)):
for j in range(len(A)):
aff[i][j] = np.sum(((A[i] - A[j])**2)*W)
return aff
This takes over twenty seconds for a 1000x2 matrix.
Then I tried a faster version, using matrix multiplication, based on this blog post:
def weightedDistanceFast(A,W):
AA = np.sum((A**2)*W,1)
A2 = -2*((A*W).dot(A.T))
return (AA+A2).W+AA
This is "instant" for the same 1000x2 matrix. However, it introduces a bunch of errors at the order of e-17.
>>> dist = desc.weightedDistanceFast(attr,(1,1))
>>> dist2 = desc.weightedDistance(attr,(1,1))
>>> np.sum(dist-dist2)
6.9155863612166586e-11
>>> len(np.where((dist-dist2 > 0))[0])
380824
Checking other answers around here, I also tried to use the wminkowski distance from the sklearn.distance package, but it yielded different results from the above two functions depending on the value of W.
def weightedDistanceMinkowski(A,W):
aff = sp.squareform(sp.pdist(A,metric='wminkowski',p=2,w=W))
return aff**2
Anyway, I know that a certain degree of error is inevitable, but since I'm going to do further calculations on this distance matrix, I wonder if there is a way to reduce this error (or, alternatively, a less error-prone way to make function 1 faster).
Thank you!