6

Basically given some vector v, I want to get another random vector w with some cosine similarity between v and w. Is there any way we can get this in python?

Example: for simplicity I will have 2D vector of v [3,-4]. I want to get random vector w with cosine similarity of 60% or plus 0.6. This should generate vector w with values [0.875, 3] or any other vector with same cosine similarity. So I hope this is clear enough.

eugen
  • 1,249
  • 9
  • 15
  • 1
    Brute force: In a loop; generate random vectors; calculate the cosine similarity; save the vector if it meets your criteria; stop when you get ten (100, 1000,?) of them. | Look at these vectors and see if there is a pattern you can use to bound the vector generation. – wwii Oct 21 '18 at 15:45
  • 1
    I'm voting to reopen this question because it is trivial to fix (delete last line). – Paul Panzer Oct 21 '18 at 16:09
  • 1
    I have deleted last line, so now it does not ask for a specific library recommendation. – eugen Oct 22 '18 at 01:19

2 Answers2

11

Given the vector v and cosine similarity costheta (a scalar between -1 and 1), compute w as in the function rand_cos_sim(v, costheta):

import numpy as np


def rand_cos_sim(v, costheta):
    # Form the unit vector parallel to v:
    u = v / np.linalg.norm(v)

    # Pick a random vector:
    r = np.random.multivariate_normal(np.zeros_like(v), np.eye(len(v)))

    # Form a vector perpendicular to v:
    uperp = r - r.dot(u)*u

    # Make it a unit vector:
    uperp = uperp / np.linalg.norm(uperp)

    # w is the linear combination of u and uperp with coefficients costheta
    # and sin(theta) = sqrt(1 - costheta**2), respectively:
    w = costheta*u + np.sqrt(1 - costheta**2)*uperp

    return w

For example,

In [17]: v = np.array([3, -4])

In [18]: w = rand_cos_sim(v, 0.6)

In [19]: w
Out[19]: array([-0.28, -0.96])

Verify the cosine similarity:

In [20]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[20]: 0.6000000000000015

In [21]: w = rand_cos_sim(v, 0.6)

In [22]: w
Out[22]: array([1., 0.])

In [23]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[23]: 0.6

The return value always has magnitude 1, so in the above example, there are only two possible random vectors, [1, 0] and [-0.28, -0.96].

Another example, this one in 3-d:

In [24]: v = np.array([3, -4, 6])

In [25]: w = rand_cos_sim(v, -0.75)

In [26]: w
Out[26]: array([ 0.3194265 ,  0.46814873, -0.82389531])

In [27]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[27]: -0.75

In [28]: w = rand_cos_sim(v, -0.75)

In [29]: w
Out[29]: array([-0.48830063,  0.85783797, -0.16023891])

In [30]: v.dot(w)/(np.linalg.norm(v)*np.linalg.norm(w))
Out[30]: -0.75
Warren Weckesser
  • 110,654
  • 19
  • 194
  • 214
  • 1
    @Warren why did you use `np.random.multivariate_normal`? I was trying to use your code and found `np.random.normal` to be much faster. Is there any specific reason behind using multivariate distribution? – ashutosh singh Oct 17 '19 at 16:49
  • 2
    @ashutoshsingh, I used `multivariate_normal` because that's the abstraction that I thought of when I implemented this. I wanted a random vector whose distribution was radially symmetric. But with a covariance matrix that is the identity, one can just as well use `np.random.normal(size=len(v))`, or even `np.random.randn(len(v))`. If those are faster, then go ahead and use one of them! – Warren Weckesser Oct 17 '19 at 17:34
-2

SciPy cosine distance: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.distance.cosine.html

from scipy.spatial.distance import cosine

v = [3, -4]

w = [0.875, 3]

cosine(v, w)

in terms of working backwards you can do that yourself by using dot products.

Alexis Drakopoulos
  • 1,115
  • 7
  • 22
  • 1
    how does this answer above question? What I want to get is vector w and I do not see how I can work it backwards... – eugen Oct 21 '18 at 15:29
  • 2
    I thought you were asking for a library that implements cosine distances, in order to work it backwards sit down and re-write the formula to find the vector you want. Then implement that, scipy also has dot products. – Alexis Drakopoulos Oct 21 '18 at 15:35