0

I'm a student who just started deep learning with Python.

First of all, my native language is not English, so I can be poor at using a translator.

I used time series data in deep learning to create a model that predicts the likelihood of certain situations in the future. We've even completed visualizations using graphs.

But rather than visualizing it through graphs, I wanted to understand the similarity between train data and test data, the accuracy of the numbers.

The two data are in the following format:

In [51] : train_r
Out[51] : array([[0., 0., 0., ..., 0., 0., 0.],
   [0., 0., 0., ..., 0., 0., 0.],
   [0., 0., 0., ..., 0., 0., 0.],

Note: This data is composed of 0 and 1.

In [52] : test_r
Out[52] : array([[0.        , 0.        , 0.        , ..., 0.03657577, 0.06709877,
    0.0569071 ],
   [0.        , 0.        , 0.        , ..., 0.04707848, 0.07826   ,
    0.0819832 ],
   [0.        , 0.        , 0.        , ..., 0.04467918, 0.07355513,
    0.08117414],

I used the Cosine Similarity method to determine the accuracy of these two types of data, but an error has occurred.

from numpy import dot
from numpy.linalg import norm
cos_sim = dot(train_r, test_r)/(norm(train_r)*norm(test_r))

ValueError: shapes (100,24) and (100,24) not aligned: 24 (dim 1) != 100 (dim 0)

So I searched the Internet to find a different way, but it didn't help because most of them were string-analysis.

Can I figure out how to calculate the similarity between the two lists, and describe it in numbers?

J.jun
  • 3
  • 3
  • 1
    FYI, you're getting the error because when you use `numpy.dot()` with 2-D arrays, it does matrix multiplication. So if you have two matrices with shape `(100,24)`, the operation is undefined because matrix multiplication is only defined for an `m x n` and a `q x r` matrix when `n==q`. – J. Taylor Feb 27 '19 at 04:26
  • You might want to look into this answer: https://stackoverflow.com/questions/52030945/python-cosine-similarity-between-two-large-numpy-arrays – Bhushan Pant Feb 27 '19 at 04:34
  • Also, this: https://stackoverflow.com/questions/43493235/cosine-distance-computation-between-two-arrays-python – Bhushan Pant Feb 27 '19 at 04:43
  • Thank you for your comments. Two array lists are multidimensional, one consisting of zero and one in the form of decimal points between zero and one. After all, it seems that it is a matter of dimension to these two lists that an error occurs in Cosine Similarity. Then, I would like to know how to solve this dimension problem and calculate the similarity between the two lists to show the accuracy of deep learning. I'd like to compare these two lists, but I can't figure out what to do. – J.jun Mar 04 '19 at 03:35

1 Answers1

0

Found the cause.

The reason for the error is that a total of 24 lists were stored in train_r and test_r.

I tried to calculate the list of 24 at once, and there was an error.

It's a simple solution. You can specify a list in train_r and test_r to calculate by cosine similarity method.

train_c = train_r[:,12]
test_c = test_r[:,12]

from numpy import dot
from numpy.linalg import norm
a = train_c
b = test_c

cos_sim = (dot(a, b)/(norm(a)*norm(b))) * 100
print(cos_sim)

95.18094658851624
J.jun
  • 3
  • 3