0

I am using a calculating score based on the cosine similarity of the ideal values array and data collected array. (code below) However, when I run the following code , the result is 99.4 which I think is weird because as 150 is very different with the ideal value 300.

import numpy as np

def cos_sim(speechrate, pitch):  #speechrate and pitch are the data collected
    v1 = np.array([300, 25]) #array of ideal values
    v2 = np.array([speechrate, pitch]) #array of data   
    similarity = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2)) 
    print("{:.1f}".format(similarity*100))

    cos_sim(150, 23)

Does anyone have any idea how to calculate the score based on the difference of the values? (not necessarily must use cosine similarity)

sttc1998
  • 51
  • 2
  • 10

1 Answers1

0

Your formula for similarity calculates the \cos \theta between the vectors (300,25) and (150,23), or in other words measures the cosine of the angle between them. If you look at the following graph, there isn't much angle between the two vectors. In fact, {\cos} ^{-1}(.994) = 6.27 degrees, which is not much different from 0 degrees where cos has the highest value of 1. vectors (300,25) and (150,23)

The metric you use here should depend on your definition of similarity. A trivial metric you can use is the Euclidean distance between the two points.

Euclidean distance between these two points is d = 150.01. And for instance between (300, 25) and (280,23) is d = 20.09 which gives you an idea about how separated they are in a 2D plane.

Unni
  • 5,348
  • 6
  • 36
  • 55
  • thank you for your reply. Now I understood what's the problem. However, I still didn't figure out a way to calculate the score based on how near the data is to the ideal values. (eg: the ideal speech rate is 300 words per seconds and the user's speech rate is 200 words per seconds ) In the case of Euclidean distance, the score value will be 100.0199 if the data is `(200, 23)`, which cannot be the case because what I want is the nearer the value is to 300, the better. – sttc1998 Dec 19 '18 at 07:47
  • I am not sure if I understood the requirement. 280 is nearer to 300 and gives you a lesser Euclidean distance than 200 which is farther from 300 compared to 280. Isn't that what you want? Or you want this value to be larger when the points are closer? In that case, just negate the distance (`-d`). – Unni Dec 19 '18 at 09:20