how to find percentage of similarity between two arrays

Question

I have two data arrays x and y:

x = array([  0.,   0.,  84.,  80.,  59.,  22.,   0.,   0.,   0.,   0.,  52.,
       122., 117.,   1.,  10.,   0.,   0.,   0.,   0.,   0.,   0.,  92.,
        90.,  74.,  46.,   0.,   0.,   0.,   0.,  28., 121., 117.,  90.,
        54.,   0.,   0.,   0.,   0.,   0.,   0.,  47.,  62.,  54.,  57.,
        23.,  63.,  26.,  62.,  52., 138., 126.,  98.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,  19.,  44.,  74.,  89., 119.,
        77., 141., 137., 119.,   0.,   0.,   0.,   0.,  91., 115.,  89.,
       143., 146.,  45.,   0.,   0.,   0.,  65.,  89.,   1.,   0.,   0.,
         0.])

y = array([  0.,   0.,  79.,  90.,  64.,   3.,   0.,   0.,   0.,   0.,  19.,
       113., 109.,   1.,  25.,   0.,   0.,   0.,   0.,   0.,   0.,  90.,
        99.,  73.,  35.,   0.,   0.,   0.,   0.,  46., 106., 113., 105.,
        52.,   0.,   0.,   0.,   0.,   0.,   0.,  57.,  68.,  47.,  20.,
         0.,  17.,   1.,  14.,  48., 120., 118., 105.,   0.,   0.,   0.,
         0.,   0.,   0.,   4.,   1.,   0.,   0.,   0.,  42.,  47.,  80.,
        86., 125., 121., 111.,  16.,   0.,   0.,   0.,  47.,  72., 112.,
       123., 129.,  82.,   0.,   0.,   0.,  87.,  80.,   0.,   0.,   5.,
         0.])

I want to check the similarity between x and y in the program code. I've tried using SequenceMatcher() but I'm not sure about the similarity presentation results using that package. because when seeing the graph it has very similar, but the results of the presentation of the similarities are only 39.33%, so for me it's weird. is there another way to check the similarity between x and y data, if so, how and based on what kind of mathematical formula is used, thank you

my code for checking similarity using SequenceMatcher()

import difflib
from difflib import SequenceMatcher


sm=difflib.SequenceMatcher(None,x,y)
a = sm.ratio()*100
print('Similarity x and Testing y : ',round(a, 2),'%')

x and y graph:

What kind of _similarity_ do you want to check for? What should the number represent? — Markus Weninger, Jan 08 '23 at 07:08
to check how similar the two data are, the greater the percentage value means the data has a very close resemblance — stack offer, Jan 08 '23 at 07:13
“… the greater the percentage value …”. What percentage value? — Wilf Rosenbaum, Jan 08 '23 at 07:29
If x and y arrays, and the elements of y are just the elements of x shifted to the right by 3 positions, do you expect x and y to be classified as “similar”? — Wilf Rosenbaum, Jan 08 '23 at 07:32
@WilfRosenbaum In my opinion, if you look at the graph above, the presentation of the similarity should be above 85%. if the resulting presentation is above 85% it will be declared similar. but by using SequenceMatcher() I only get a similarity of 39.33% — stack offer, Jan 08 '23 at 07:36
Then I guess SequenceMatcher is not computing the right measure of similarity for your purposes. — Wilf Rosenbaum, Jan 08 '23 at 07:42
yes, so this is my question, I'm asking for advice on what method is suitable for finding similarities between x and y data — stack offer, Jan 08 '23 at 07:45
SequenceMatcher is computing the longest contiguous matching sequence between the two input sequences. — Wilf Rosenbaum, Jan 08 '23 at 07:47
Why not try the average squared difference between the x and y entries? — Wilf Rosenbaum, Jan 08 '23 at 07:54
can u give me some example or references (including formula explanation)? — stack offer, Jan 08 '23 at 07:55

score 1 · Answer 1 · answered Jan 08 '23 at 07:09

1

Consider taking the Cross-Correlation function: https://en.wikipedia.org/wiki/Cross-correlation

Discussion: Computing cross-correlation function?

Numpy implementation: https://numpy.org/doc/stable/reference/generated/numpy.correlate.html

answered Jan 08 '23 at 07:09

cavalcantelucas

1,362
3
12
34

I want to find the similarity between two data above, not to find the correlation between the two data above – stack offer Jan 08 '23 at 07:24
Correlation is a measure of similarity. If you want an absolute value representing that, you can get the power of the resulting correlation `np.abs(signal)**2` – cavalcantelucas Jan 08 '23 at 07:27
can we convert it to a percentage from 0%-100%? https://postimg.cc/XGWGDJYH – stack offer Jan 08 '23 at 07:38
You can calculate the correlation of the signal with itself for 100% similarity – cavalcantelucas Jan 08 '23 at 08:41
do yo mean b*100? – stack offer Jan 08 '23 at 08:57

score 0 · Answer 2 · answered Jul 02 '23 at 20:45

you can use this:

Cosine Similarity: Cosine similarity measures the cosine of the angle between two vectors. you can consider the matrices as flattened vectors and calculate the cosine similarity.

from sklearn.metrics.pairwise import cosine_similarity

x = np.array([  0.,   0.,  84.,  80.,  59.,  22.,   0.,   0.,   0.,   0.,  52.,
       122., 117.,   1.,  10.,   0.,   0.,   0.,   0.,   0.,   0.,  92.,
        90.,  74.,  46.,   0.,   0.,   0.,   0.,  28., 121., 117.,  90.,
        54.,   0.,   0.,   0.,   0.,   0.,   0.,  47.,  62.,  54.,  57.,
        23.,  63.,  26.,  62.,  52., 138., 126.,  98.,   0.,   0.,   0.,
         0.,   0.,   0.,   0.,   0.,   0.,  19.,  44.,  74.,  89., 119.,
        77., 141., 137., 119.,   0.,   0.,   0.,   0.,  91., 115.,  89.,
       143., 146.,  45.,   0.,   0.,   0.,  65.,  89.,   1.,   0.,   0.,
         0.])

y = np.array([  0.,   0.,  79.,  90.,  64.,   3.,   0.,   0.,   0.,   0.,  19.,
       113., 109.,   1.,  25.,   0.,   0.,   0.,   0.,   0.,   0.,  90.,
        99.,  73.,  35.,   0.,   0.,   0.,   0.,  46., 106., 113., 105.,
        52.,   0.,   0.,   0.,   0.,   0.,   0.,  57.,  68.,  47.,  20.,
         0.,  17.,   1.,  14.,  48., 120., 118., 105.,   0.,   0.,   0.,
         0.,   0.,   0.,   4.,   1.,   0.,   0.,   0.,  42.,  47.,  80.,
        86., 125., 121., 111.,  16.,   0.,   0.,   0.,  47.,  72., 112.,
       123., 129.,  82.,   0.,   0.,   0.,  87.,  80.,   0.,   0.,   5.,
         0.])

matrix1_flat = x.flatten()
matrix2_flat = y.flatten()

similarity_ratio = cosine_similarity([matrix1_flat], [matrix2_flat])[0][0]
print(similarity_ratio)

ouput: 0.9657650274258939

how to find percentage of similarity between two arrays

2 Answers2