I want to calculate cosine similarity between articles. And I am running into the problem that my implementation approach would take a long time for the size of the data that I am going to run.
from scipy import spatial
import numpy as np
from numpy import array
import sklearn
from sklearn.metrics.pairwise import cosine_similarity
I = [[3, 45, 7, 2],[2, 54, 13, 15], [2, 54, 1, 13]]
II = [2, 54, 13, 15]
print cosine_similarity(II, I)
With the example above, to calculate I and II already took 1.0s and the dimension of my data is around (100K, 2K).
Is there other packages that I could use to run a huge matrix?