How can I use any classifier to classify my data with each data point consisting of a set of floating values?

Question

I have data in this format-

[0.266465 0.9203907 1.007363 ... 0. 0.09623989 0.39632136]

It is the value of the first row and first column.

It is the value of the second column of the first row:

[0.9042176 1.135085 1.2988662 ... 0. 0.13614458 0.28000486]

I have 2200 such rows and I want to train a classifier to identify that if the two set of values are similar or not?

P.S.- These are extracted feature vector values.

score 1 · Accepted Answer · answered Oct 28 '18 at 12:50

If you assume relation between two extracted feature vectors to be linear, you could try using Pearson correlation:

import numpy as np
from scipy.stats import pearsonr

list1 = np.random.random(100)
list2 = np.random.random(100)

pearsonr(list1, list2)

An example output is:

(0.0746901299996632, 0.4601843257734832)

Where first value refers to correlation (7%), the second to its significance (with > 0,05 you accept the null hypothesis that the correlation is insignificant at significance level alfa = 5%). And if vectors are correlated, they are be in a way similar. More about the method here.

Also, I came across Normalized Cross-Correlation that is used for identifying similarity between pictures (not an expert, so rather check this).

thanks man, this idea helps, also one another way of doing it could be which i found out is by subtracting those two vectors and storing the result and then training the result on a svm. — VaibhavSka, Oct 28 '18 at 14:23

How can I use any classifier to classify my data with each data point consisting of a set of floating values?

1 Answers1