Questions tagged [cosine-similarity]

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. It is a popular similarity measure between two vectors because it is calculated as a normalized dot product between the two vectors, which can be calculated with simple mathematical operations.

From Wikipedia:

Cosine similarity is a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. The cosine of 0 degrees is 1, and it is less than 1 for any other angle. It is thus a judgement of orientation and not magnitude: two vectors with the same orientation have a cosine similarity of 1, two vectors at 90 degrees have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity is a popular similarity measure between two vectors a and b because it can be computed efficiently dividing the dot product of the two vectors by the Euclidean norm of each (the square root of the sum of the squared terms). For instance, vectors (0, 3, 4) and (-3, 4, 0) have dot product 12 and each have norm 5, so their dot product similarity is 12/5/5 = 0.48.

1004 questions
235
votes
18 answers

Cosine Similarity between 2 Number Lists

I want to calculate the cosine similarity between two lists, let's say for example list 1 which is dataSetI and list 2 which is dataSetII. Let's say dataSetI is [3, 45, 7, 2] and dataSetII is [2, 54, 13, 15]. The length of the lists are always…
Rob Alsod
  • 2,635
  • 3
  • 19
  • 18
209
votes
12 answers

Can someone give an example of cosine similarity, in a very simple, graphical way?

Cosine Similarity article on Wikipedia Can you show the vectors here (in a list or something) and then do the math, and let us see how it works?
TIMEX
  • 259,804
  • 351
  • 777
  • 1,080
89
votes
10 answers

What's the fastest way in Python to calculate cosine similarity given sparse matrix data?

Given a sparse matrix listing, what's the best way to calculate the cosine similarity between each of the columns (or rows) in the matrix? I would rather not iterate n-choose-two times. Say the input matrix is: A= [0 1 0 0 1 0 0 1 1 1 1 1 0 1…
zbinsd
  • 4,084
  • 6
  • 33
  • 40
86
votes
8 answers

Calculate cosine similarity given 2 sentence strings

From Python: tf-idf-cosine: to find document similarity , it is possible to calculate document similarity using tf-idf cosine. Without importing external libraries, are that any ways to calculate cosine similarity between 2 strings? s1 = "This is a…
alvas
  • 115,346
  • 109
  • 446
  • 738
42
votes
6 answers

Cosine similarity and tf-idf

I am confused by the following comment about TF-IDF and Cosine Similarity. I was reading up on both and then on wiki under Cosine Similarity I find this sentence "In case of of information retrieval, the cosine similarity of two documents will…
N00programmer
  • 1,111
  • 4
  • 13
  • 17
26
votes
5 answers

How to compare sentence similarities using embeddings from BERT

I am using the HuggingFace Transformers package to access pretrained models. As my use case needs functionality for both English and Arabic, I am using the bert-base-multilingual-cased pretrained model. I need to be able to compare the similarity of…
KOB
  • 4,084
  • 9
  • 44
  • 88
23
votes
3 answers

Using K-means with cosine similarity - Python

I am trying to implement Kmeans algorithm in python which will use cosine distance instead of euclidean distance as distance metric. I understand that using different distance function can be fatal and should done carefully. Using cosine distance…
ise372
  • 231
  • 1
  • 2
  • 5
19
votes
1 answer

Difference between cosine similarity and cosine distance

It looks like scipy.spatial.distance.cdist cosine similariy distance: link to cos distance 1 1 - u*v/(||u||||v||) is different from sklearn.metrics.pairwise.cosine_similarity which is link to cos similarity 2 u*v/||u||||v|| Does anybody know…
user1700890
  • 7,144
  • 18
  • 87
  • 183
17
votes
2 answers

Cosine similarity when one of vectors is all zeros

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity ) when one of the vectors is all zeros? v1 = [1, 1, 1, 1, 1] v2 = [0, 0, 0, 0, 0] When we calculate according to the classic formula we get division by zero: Let…
14
votes
3 answers

cosine similarity on large sparse matrix with numpy

The code below causes my system to run out of memory before it completes. Can you suggest a more efficient means of computing the cosine similarity on a large matrix, such as the one below? I would like to have the cosine similarity computed for…
Sal
  • 277
  • 2
  • 3
  • 9
13
votes
2 answers

Calculating the cosine similarity between all the rows of a dataframe in pyspark

I have a dataset containing workers with their demographic information like age gender,address etc and their work locations. I created an RDD from the dataset and converted it into a DataFrame. There are multiple entries for each ID. Hence, I…
Abhinav Choudhury
  • 319
  • 2
  • 3
  • 15
13
votes
1 answer

Apache Spark Python Cosine Similarity over DataFrames

For a Recommender System, I need to compute the cosine similarity between all the columns of a whole Spark DataFrame. In Pandas I used to do this: import sklearn.metrics as metrics import pandas as pd df= pd.DataFrame(...some dataframe over here :D…
13
votes
3 answers

Cosine distance as vector distance function for k-means

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not…
Thalis K.
  • 7,363
  • 6
  • 39
  • 54
11
votes
3 answers

How to get cosine distance between two vectors in postgres?

I am wondering if there is a way to get cosine distance of two vectors in postgres. For storing vectors I am using CUBE data type. Below is my table definition: test=# \d vectors …
11
votes
2 answers

Postgres: index on cosine similarity of float arrays for one-to-many search

Cosine similarity between two equally-sized vectors (of reals) is defined as the dot product divided by the product of the norms. To represent vectors, I have a large table of float arrays, e.g. CREATE TABLE foo(vec float[])'. Given a certain float…
sudo
  • 5,604
  • 5
  • 40
  • 78
1
2 3
66 67