1

I have data in this format-

[0.266465 0.9203907 1.007363 ... 0. 0.09623989 0.39632136]

It is the value of the first row and first column.

It is the value of the second column of the first row:

[0.9042176 1.135085 1.2988662 ... 0. 0.13614458 0.28000486]

I have 2200 such rows and I want to train a classifier to identify that if the two set of values are similar or not?

P.S.- These are extracted feature vector values.

VaibhavSka
  • 25
  • 1
  • 7

1 Answers1

1

If you assume relation between two extracted feature vectors to be linear, you could try using Pearson correlation:

import numpy as np
from scipy.stats import pearsonr

list1 = np.random.random(100)
list2 = np.random.random(100)

pearsonr(list1, list2)

An example output is:

(0.0746901299996632, 0.4601843257734832)

Where first value refers to correlation (7%), the second to its significance (with > 0,05 you accept the null hypothesis that the correlation is insignificant at significance level alfa = 5%). And if vectors are correlated, they are be in a way similar. More about the method here.

Also, I came across Normalized Cross-Correlation that is used for identifying similarity between pictures (not an expert, so rather check this).

shaimar
  • 128
  • 1
  • 1
  • 10
  • thanks man, this idea helps, also one another way of doing it could be which i found out is by subtracting those two vectors and storing the result and then training the result on a svm. – VaibhavSka Oct 28 '18 at 14:23