1

I have a python program that calculates the cosine similarity score between two vectors. The program does this between high volumes of vectors, then sorts the end results by the resulting cosine similarity score. I am writing system tests for this program, and am trying to generate vectors that could be used to calculate expected cosine similarity scores.

For instance, let's say I have the following vectors:

vectorA = [10, 100, 1000]
vectorB = [200, 200, 200]

the cosine similarity score between these vectors is approximately 0.638. But let's say that I started with vectorA and the cosine score, and was actually trying to find a vectorB that would create said score. There are an infinite number of vectorB's that could do this, and I'm looking for any single one.

I've been exploring a few options, including running what I'm looking for through ChatGPT. It seems the common process looks something like the following:

  1. Find the Euclidean norm (magnitude) of vectorA.
  2. Calculating the expected Euclidean norm (magnitude) of vectorB using the cosine similarity score.
  3. Creating a vectorB with the same dimensions as vectorA using the calculated norm of vectorB.

However, I have been unable to put together a consistent algorithm that actually calculates what I'm looking for. Linear algebra is not my strong suite. Does anyone have any suggestions?

For reference, below is a function that would calculate the cosine similarity score given 2 vectors:

def cosine_score(vectorA, vectorB):
    AB = sum([vectorA[i] * vectorB[i] for i in range(len(vectorA))])
    A = sum([vectorA[i] ** 2 for i in range(len(vectorA))])
    B = sum([vectorB[i] ** 2 for i in range(len(vectorB))])
    expectedScore = AB / (A**0.5 * B**0.5)
    return expectedScore
CCranney
  • 21
  • 3
  • Would it not be easier to generate some random `vectorB`s, and pass them through some known good cosine similarity engine (Matlab?), and record the value you get? – Tim Roberts Aug 17 '23 at 16:58
  • For a given pair of vector directions, cosine similarity is independent of the Euclidean norms (in fact, that's one of its desirable properties), so there isn't really an "expected Euclidean norm" of B. – slothrop Aug 17 '23 at 16:58
  • 1
    Yes that does! Thank you so much! – CCranney Aug 17 '23 at 19:18

0 Answers0