for a project, I need an efficient function in python that solves to following task:
Given a very large List X of long sparse Vectors (=> big sparse Matrix) and another Matrix Y that contains a single Vector y, I want a List of "distances", that y has to every Element of X. Hereby the "distance" is defined like this:
Compare each Element of the two Vectors, always take the lower one and sum them up.
Example:
X = [[0,0,2],
[1,0,0],
[3,1,0]]
Y = [[1,0,2]]
The function should return dist = [2,1,1]
In my project, both X and Y contain a lot of zeros and come in as an instance of:
<class 'scipy.sparse.csr.csr_matrix'>
So far so good and I managed to write a functions that solves this task, but is very slow and horrible inefficient. I need some tips on how to efficienty process/iterate the sparse Matrices. This is my function:
def get_distances(X, Y):
Ret=[]
rows, cols = X.shape
for i in range(0,rows):
dist = 0
sample = X.getrow(i).todense()
test = Y.getrow(0).todense()
rows_s, cols_s = sample.shape
rows_t, cols_t = test.shape
for s,t in zip(range(0, cols_s), range(0, cols_t)):
dist += min(sample[0,s], test[0,t])
X_ret.append([dist])
return ret
To do my Operations, I convert the sparse matrices to dense matrices which is of course horrible, but I did not know how to do it better. Do you know how to improve my code and make the function faster?
Thank you a lot!