Difference between scipy, numpy, sklearn, torch module when used to calculate cosine_similarity?

Question

The spatial.cosine.distance() function from the scipy module calculates the distance instead of the cosine similarity, but to achieve that, we can subtract the value of the distance from 1.

from scipy import spatial
List1 = [4, 47, 8, 3]
List2 = [3, 52, 12, 16]
result = 1 - spatial.distance.cosine(List1, List2)
print(result)

The numpy.dot() function calculates the dot product of the two vectors passed as parameters. The numpy.norm() function returns the vector norm.

I can use these functions with the correct formula to calculate the cosine similarity.

from numpy import dot
from numpy.linalg import norm
List1 = [4, 47, 8, 3]
List2 = [3, 52, 12, 16]
result = dot(List1, List2)/(norm(List1)*norm(List2))
print(result)

In the sklearn module, there is an in-built function called cosine_similarity() to calculate the cosine similarity.

from sklearn.metrics.pairwise import cosine_similarity,cosine_distances
A=np.array([10,3])
B=np.array([8,7])
result=cosine_similarity(A.reshape(1,-1),B.reshape(1,-1))
print(result)

When I deal with N-dimensional tensors having shapes (m,n), I can use the consine_similarity() function from the torch module to find the cosine similarity.

import torch
import torch.nn.functional as F
t1 = [3,45,6,8]
a = torch.FloatTensor(t1)

t2 = [4,54,3,7]
b = torch.FloatTensor(t2)
result = F.cosine_similarity(a, b, dim=0)

print(result)

Which method should I use to get the best similarity results if I use 2d array using the above modules in python?

They do the same. There could be a numerical difference, but I did not notice any. The speed is a bit different. The numpy is the fastest for me of these 4, but the manual implementation may be even faster (https://stackoverflow.com/a/18424953/3219777) in many cases. sklearn is significantly slower than others. — Askold Ilvento, Jun 10 '22 at 07:45
@AskoldIlvento Thanks! Because I not really understand it when I using above modules, this confuses me when choosing a solution to my problem. — Cyrus, Jun 10 '22 at 07:55
But I input 2d array it not working "(stackoverflow.com/a/18424953/3219777)" ```python a = ([1,2,3,4],[4,3,2,1]) b = ([3,4,5,6],[4,5,6,7])``` I expect result ```python [0.8 0.7 0.5 0.4]``` — Cyrus, Jun 10 '22 at 08:34

Difference between scipy, numpy, sklearn, torch module when used to calculate cosine_similarity?

0 Answers0