I have two data sets that I'm trying to cross-correlate. They look similar to the arctan
function, so I've been using it as a model to work out how to do my signal processing.
x = linspace(-15, 15, 2**13)
f1 = arctan(x)
f2 = arctan(x + 2)
The question I need to answer is, how much do I need to shift the green signal to get it to (mostly) overlap with the blue one? I thought it would be as simple as finding the maximum in the cross-correlation function of f1
and f2
, and I broadly followed the advice here: How to correlate two time series with gaps and different time bases?. This is what I've been trying
c = correlate(f1, f2, 'full')
s = arange(1-2**13, 2**13)
dx = 30/2**13
shift = s[c.argmax()]*dx
I would expect shift
to equal more or less exactly 2, but in fact it's only 0.234
. This doesn't make any sense to me; I've found the x-coordinate of the maximum in cross-correlation, which should be found where there is maximal overlap of the two signals.
Any ideas on how to calculate this quantity for this kind of function?
EDIT: I should add, for my real data, all of the values will be between zero and one
EDIT EDIT: The following functions are actually more like my real data:
x = linspace(-15, 15, 400)
f1 = (arctan(-x) + pi/2) / pi
f2 = (arctan(-x + 2) + pi/2) / pi
So using the formula given here: http://paulbourke.net/miscellaneous/correlate/ I can write a cross-correlation function that pads the data to add ones to the left and zeros to the right:
def xcorr(x, y);
mx = x.mean()
my = y.mean()
sx = x.std()
sy = y.std()
r = zeros(2*len(x))
for d in range(-len(x), len(x)):
csum = 0
for i in range(0, len(x):
yindx = i - d
if i - d < 0:
yval = 1
elif i - d >= len(x):
yval = 0
else:
yval = y[yindx]
csum += (x[i] - mx) * (yval - my)
r[d + len(x)] = csum / (sx * sy)
return r
With this function, I can now do
c = xcorr(f1, f2)
s = arange(-400, 400)
dx = 30/400
shift = s[c.argmax()] * dx
Which comes out to 2.025, which is as close as you can get to 2 with this precision. So it looks like Jamie was correct, the issue is in how numpy correlate
does the padding of signals.
So, obviously my xcorr
function is really slow as it stands. The question now is, is there a way I can make NumPy do a similar thing, or should I just proceed to write my algorithm in C using ctypes
?