Cosine distance of vector to matrix

Question

In python, is there a vectorized efficient way to calculate the cosine distance of a sparse array u to a sparse matrix v, resulting in an array of elements [1, 2, ..., n] corresponding to cosine(u,v[0]), cosine(u,v[1]), ..., cosine(u, v[n])?

Might solve your case : [`Find minimum cosine distance between two matrices`](http://stackoverflow.com/questions/32688866/find-minimum-cosine-distance-between-two-matrices). — Divakar, Apr 29 '16 at 09:20

score 1 · Accepted Answer · answered Apr 28 '16 at 16:28

1

Not natively. You can however use the library scipy that can compute the cosine distance between two vectors for you: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html. You can build a version that takes a matrix using this as a stepping stone.

answered Apr 28 '16 at 16:28

the blizz

330
3
7

Yeah, but that would require iterating over rows. Wanted to avoid that if I could since I have lots of rows. Will give it a shot though. – David Apr 28 '16 at 16:32
@David: Did you find a way to avoid iterating over rows? I am facing the same problem – Luk Mar 20 '20 at 09:32
@Luk: never solved it unfortunately, ended up iterating over all rows - took forever. – David Mar 22 '20 at 14:43

hank · Answer 2 · 2021-01-15T13:06:14.823

Add the vector onto the end of the matrix, calculate a pairwise distance matrix using sklearn.metrics.pairwise_distances() and then extract the relevant column/row.

So for vector v (with shape (D,)) and matrix m (with shape (N,D)) do:

import sklearn
from sklearn.metrics import pairwise_distances

new_m = np.concatenate([m,v[None,:]], axis=0)
distance_matrix = sklearn.metrics.pairwise_distances(new_m, axis=0), metric="cosine")
distances = distance_matrix[-1,:-1]

Not ideal, but better than iterating!

This method can be extended if you are querying more than one vector. To do this, a list of vectors can be concatenated instead.

Filipe · Answer 3 · 2021-01-31T15:20:22.587

I think there is a way using the definition and the numpy library:

Definition:

import numpy as np

#just creating random data
u = np.random.random(100)
v = np.random.random((100,100))

#dot product: for every row in v, multiply u and sum the elements
u_dot_v = np.sum(u*v,axis = 1)

#find the norm of u and each row of v
mod_u = np.sqrt(np.sum(u*u))
mod_v = np.sqrt(np.sum(v*v,axis = 1))

#just apply the definition
final = 1 - u_dot_v/(mod_u*mod_v)

#verify with the cosine function from scipy
from scipy.spatial.distance import cosine
final2 = np.array([cosine(u,i) for i in v])

The definition of cosine distance i found here :https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html#scipy.spatial.distance.cosine

score 0 · Answer 4 · answered Apr 28 '16 at 16:28

0

In scipy.spatial.distance.cosine()

http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.spatial.distance.cosine.html

answered Apr 28 '16 at 16:28

kingledion

2,263
3
25
39

score 0 · Answer 5 · answered Jan 31 '21 at 13:46

Below worked for me, have to provide correct signature

from scipy.spatial.distance import cosine

def cosine_distances(embedding_matrix, extracted_embedding):
  return cosine(embedding_matrix, extracted_embedding)
cosine_distances = np.vectorize(cosine_distances, signature='(m),(d)->()')

cosine_distances(corpus_embeddings, extracted_embedding)

In my case
corpus_embeddings is a (10000,128) matrix
extracted_embedding is a 128-dimensional vector

Cosine distance of vector to matrix

5 Answers5