Suppose I have a sparse matrix of document collection, where each row is a vector representing a document (generated by scikit-learn's tfidf_transformer for example).
tfidf_matrix = tfidf_transformer.fit_transform(posting)
Now I have a query coming in,
query = transformer.transform(vectorizer.transform(['I am a sample query']))
So I want to compare this query, to each of the document (each row) of the matrix using scipy.spatial.distance.cosine (cosine similarity). So I do a map as follows
result = map(lambda document: cosine(document.toarray(), query[0].toarray()), tfidf_matrix)
it could be done with a loop as well
result = []
for row in tfidf_matrix:
result = result + [cosine(row.toarray(), query[0].toarray())]
However, it is slow (I threw in a gevent.threadpool.map to it out of frustration with same result). I am pretty sure this is not the right way of doing this (mapping a function to each row of a sparse matrix), but I can't seem to find the proper way of doing this.
So the question is, what is the proper way to map a function to each row in the sparse matrix (scipy.csr_matrix)?