0

I have two signals which I want to compare in terms of similarity. One is smaller (by time) than the other one. If I use correlation to find the highest similarity it tells me that the highest values is at an value where I would'nt expect it.

Could anyone give me a hint if I am just thinking "wrong" or is correlation the wrong tool for that kind of a problem?

My setup:

import numpy
import matplotlib.pyplot as plt

signal_a = numpy.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
signal_b = numpy.array([28, 22])
correlations = numpy.correlate(signal_a, signal_b, mode = "full")

print(correlations)
plt.plot(correlations)

Outputs this chart and correlations array

The highest correlation of [28, 22] is calculated at the position [..., 30, 20, ...]. I understand the formula and why it is 1280. But I am actually looking for [..., 28, 22, ...] as it is exactly (at that case) what I am looking for (Signal B).

Is correlation the right thing to do? I have found so many sources where correlation gets used to detect similarity. Shouldn't the same values be more similar than any other ones?

dest
  • 3
  • 3

2 Answers2

0

Instead of looking into correlation you might look into difference in values to detect similarity. You could for example pick every 2 elements in a (if b has length 2) and look at the absolute values of the differences:

 import numpy as np
 import matplotlib.pyplot as plt

 signal_a = np.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
 signal_b = np.array([28, 22])
 N2 = len(signal_b)

 diffs = []
 for i in range(len(signal_a) - len(signal_b) + 1):
      diff_ab = signal_a[i:i+N2] - signal_b
      diffs.append(sum(abs(diff_ab)))

 print(diffs)
 plt.plot(diffs)

And find a minimum in the diffs array. Instead of abs() you could use the squared value of the difference as well.

0

One possible solution to your problem is Mean Squared Error (MSE). Given two signals a and b of same dimensions, MSE is the average value of the element-wise squares of the difference between a and b. The code would look like follows (based on this):

import numpy as np
import matplotlib.pyplot as plt

a = np.array([10, 20, 10, 30, 20, 10, 28, 22, 10])
b = np.array([28, 22])
mse = np.ndarray((len(a) - len(b) + 1))

for i in range(c.size):
    mse[i] = np.square(np.subtract(a[i:i+len(b)],b)).mean()

print(mse.argmin())
plt.plot(mse)
mahesh
  • 1,028
  • 10
  • 24
  • Thank you for directing me to another possible solution. We will try that solution, but do you know _why_ cross correlation is "wrong" for our problem? – dest Nov 21 '18 at 10:35
  • @dest : Because `numpy.correlate` doesn't really calculate the cross-correlation as understood by statisticians. For that you need to use `numpy.corrcoef` (https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.corrcoef.html). But this function seems to need some more parameters that I didn't really explore. You can use that method too if you wish. For most purposes, MSE is good enough. – mahesh Nov 21 '18 at 12:28