I have a python program that calculates the cosine similarity score between two vectors. The program does this between high volumes of vectors, then sorts the end results by the resulting cosine similarity score. I am writing system tests for this program, and am trying to generate vectors that could be used to calculate expected cosine similarity scores.
For instance, let's say I have the following vectors:
vectorA = [10, 100, 1000]
vectorB = [200, 200, 200]
the cosine similarity score between these vectors is approximately 0.638
. But let's say that I started with vectorA
and the cosine score, and was actually trying to find a vectorB
that would create said score. There are an infinite number of vectorB
's that could do this, and I'm looking for any single one.
I've been exploring a few options, including running what I'm looking for through ChatGPT. It seems the common process looks something like the following:
- Find the Euclidean norm (magnitude) of
vectorA
. - Calculating the expected Euclidean norm (magnitude) of
vectorB
using the cosine similarity score. - Creating a
vectorB
with the same dimensions asvectorA
using the calculated norm ofvectorB
.
However, I have been unable to put together a consistent algorithm that actually calculates what I'm looking for. Linear algebra is not my strong suite. Does anyone have any suggestions?
For reference, below is a function that would calculate the cosine similarity score given 2 vectors:
def cosine_score(vectorA, vectorB):
AB = sum([vectorA[i] * vectorB[i] for i in range(len(vectorA))])
A = sum([vectorA[i] ** 2 for i in range(len(vectorA))])
B = sum([vectorB[i] ** 2 for i in range(len(vectorB))])
expectedScore = AB / (A**0.5 * B**0.5)
return expectedScore