17

How to express the cosine similarity ( http://en.wikipedia.org/wiki/Cosine_similarity )

when one of the vectors is all zeros?

v1 = [1, 1, 1, 1, 1]

v2 = [0, 0, 0, 0, 0]

When we calculate according to the classic formula we get division by zero:

Let d1 = 0 0 0 0 0 0
Let d2 = 1 1 1 1 1 1
Cosine Similarity (d1, d2) =  dot(d1, d2) / ||d1|| ||d2||dot(d1, d2) = (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) + (0)*(1) = 0

||d1|| = sqrt((0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2 + (0)^2) = 0

||d2|| = sqrt((1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2 + (1)^2) = 2.44948974278

Cosine Similarity (d1, d2) = 0 / (0) * (2.44948974278)
                           = 0 / 0

I want to use this similarity measure in a clustering application. And I often will need to compare such vectors. Also [0, 0, 0, 0, 0] vs. [0, 0, 0, 0, 0]

Do you have any experience? Since this is a similarity (not a distance) measure should I use special case for

d( [1, 1, 1, 1, 1]; [0, 0, 0, 0, 0] ) = 0

d([0, 0, 0, 0, 0]; [0, 0, 0, 0, 0] ) = 1

what about

d([1, 1, 1, 0, 0]; [0, 0, 0, 0, 0] ) = ? etc.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
Sebastian Widz
  • 1,962
  • 4
  • 26
  • 45

2 Answers2

22

If you have 0 vectors, cosine is the wrong similarity function for your application.

Cosine distance is essentially equivalent to squared Euclidean distance on L_2 normalized data. I.e. you normalize every vector to unit length 1, then compute squared Euclidean distance.

The other benefit of Cosine is performance - computing it on very sparse, high-dimensional data is faster than Euclidean distance. It benefits from sparsity to the square, not just linear.

While you obviously can try to hack the similarity to be 0 when exactly one is zero, and maximal when they are identical, it won't really solve the underlying problems.

Don't choose the distance by what you can easily compute.

Instead, choose the distance such that the result has a meaning on your data. If the value is undefined, you don't have a meaning...

Sometimes, it may work to discard constant-0 data as meaningless data anyway (e.g. analyzing Twitter noise, and seeing a Tweet that is all numbers, no words). Sometimes it doesn't.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • What would a more appropriate similarity measure be in this case then? Hamming distance? – Roy Sep 23 '19 at 08:22
  • There is no context given. Euclidean distance could also be "more appropriate". – Has QUIT--Anony-Mousse Sep 24 '19 at 05:52
  • I don't see how cosine similarity can be equivalent to squared Euclidean distance. Squared Euclidean distance is strictly non-negative, while cosine similarity can be either positive or negative, ranging from -1 to +1. Cosine similarity can express correlation (+1) or anti-correlation (-1), which squared Euclidean distance can not. So how can it be equivalent to it? – bluenote10 Jun 17 '23 at 11:38
3

It is undefined.

Think you have a vector C that is not zero in place your zero vector. Multiply it by epsilon > 0 and let run epsilon to zero. The result will depend on C, so the function is not continuous when one of the vectors is zero.

Gyro Gearloose
  • 1,056
  • 1
  • 9
  • 26