Use of pandas.shift() to align datasets based on scipy.signal.correlate

Question

I have datasets that look like the following: data0, data1, data2 (analogous to time versus voltage data)

If I load and plot the datasets using code like:

import pandas as pd
import numpy as np
from scipy import signal
from matplotlib import pylab as plt

data0 = pd.read_csv('data0.csv')
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')

plt.plot(data0.x, data0.y, data1.x, data1.y, data2.x, data2.y)

I get something like:

plotting all three datasets

now I try to correlate data0 with data1:

shft01 = np.argmax(signal.correlate(data0.y, data1.y)) - len(data1.y)
print shft01
plt.figure()
plt.plot(data0.x, data0.y,
         data1.x.shift(-shft01), data1.y)
fig = plt.gcf()

with output:

-99

and

shifted version of data1

which works just as expected! but if I try it the same thing with data2, I get a plot that looks like:

shifted version of data2

with a positive shift of 410. I think I am just not understanding how pd.shift() works, but I was hoping that I could use pd.shift() to align my data sets. As far as I understand, the return from correlate() tells me how far off my data sets are, so I should be able to use shift to overlap them.

score 8 · Accepted Answer · answered Oct 29 '13 at 03:41

8

panda.shift() is not the correct method to shift curve along x-axis. You should adjust X values of the points:

plt.plot(data0.x, data0.y)
for target in [data1, data2]:
    dx = np.mean(np.diff(data0.x.values))
    shift = (np.argmax(signal.correlate(data0.y, target.y)) - len(target.y)) * dx
    plt.plot(target.x + shift, target.y)

here is the output:

enter image description here

answered Oct 29 '13 at 03:41

HYRY

94,853
25
187
187

Thank you. That makes a lot of sense. I guess I was unclear as to how `pd.shift()` worked based on the documentation available. – not link Oct 29 '13 at 14:53

score 5 · Answer 2 · edited May 23 '17 at 11:54

5

@HYRY one correction to your answer: there is an indexing mismatch between len(), which is one-based, and np.argmax(), which is zero-based. The line should read:

shift = (np.argmax(signal.correlate(data0.y, target.y)) - (len(target.y)-1)) * dx

For example, in the case where your signals are already aligned:

len(target.y) = N (one-based)

The cross-correlation function has length 2N-1, so the center value, for aligned data, is:

np.argmax(signal.correlate(data0.y, target.y) = N - 1 (zero-based)

shift = ((N-1) - N) * dx = (-1) * dx, when we really want 0 * dx

edited May 23 '17 at 11:54

Community

1
1

answered Mar 15 '17 at 21:26

AhabTheArab

158
2
5

Nice! This helped me when I tried to find an equivalent `alignsignals` method in Python after switching from Matlab. – kyrlon Dec 11 '22 at 22:08

Use of pandas.shift() to align datasets based on scipy.signal.correlate

2 Answers2

Linked