0

I am trying to plot the autocorrelation between two Time Series in search for a needed lag. Python statsmodels.graphics.tsaplots library offers a plot_acf for investigation of the lagged impact of Time Series on itself.

How could I plot this lagged correlation to explore one Time Series impacting another Time Series to understand which lag I should choose?

rafaelc
  • 57,686
  • 15
  • 58
  • 82
monkey
  • 19
  • 1
  • 5
  • Hi! First of all, what do you mean by `autocorrelation` between two time series? It's either autocorrelation of a single time series or correlation between two.. Second, im not sure what you are trying to accomplish in the end. Can you provide an input/expected output and the code you've tried? Would help a lot – rafaelc Apr 05 '19 at 12:26
  • how about 'np.correlate(SeriesA, SeriesB, "full") ' from numpy – Magellan88 Apr 05 '19 at 12:30
  • statsmodels has the cross-correlation function but no corresponding plot function https://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.ccf.html – Josef Apr 05 '19 at 20:50

2 Answers2

2

To clarify, since you are attempting to investigate the correlations between two different time series, you are attempting to calculate the cross-correlation.

There is no such thing as "autocorrelation between two time series" - autocorrelation means the correlations within one time series across separate lags.

Let's take an example. Suppose one wishes to examine the cross-correlation between sunlight hours and maximum temperature in a location. This process is subject to seasonal lag - whereby maximum temperature will lag the period of maximum sunlight hours.

The cross-correlation is plotted for the data as follows:

# Import Libraries
import numpy as np
import pandas as pd
import statsmodels
import statsmodels.tsa.stattools as ts
from statsmodels.tsa.stattools import acf, pacf
import matplotlib as mpl
import matplotlib.pyplot as plt
import quandl
import scipy.stats as ss

import os;
path="directory"
os.chdir(path)
os.getcwd()

#Variables
dataset=np.loadtxt("weather.csv", delimiter=",")
x=dataset[:,0]
y=dataset[:,1]
plt.xcorr(x, y, normed=True, usevlines=True, maxlags=365)
plt.title("Sunlight Hours versus Maximum Temperature")
plt.show()

Calculating the cross-correlations across a maximum of 365 lags, here is a plot of the data:

sunlight

In this instance, the strongest correlation between maximum sunlight hours and maximum air temperature comes lags by approximately 40 days, i.e. this is when the strongest correlation between the two time series is observed.

In your case, I would recommend plotting cross-correlation between the two time series to determine if a lag is present, and if so by how many time periods.

Michael Grogan
  • 973
  • 5
  • 10
0

https://stackoverflow.com/users/7094244/michael-grogan thank you for the explanation of "autocorrelation" and "crosscorrelation". I would rather suggest converting your plot image in more "statistical". For example like this one I made:

plt.xcorr(TS1, TS2, usevlines=True, maxlags=20, normed=True, lw=2)
plt.grid(True)
plt.axhline(0.2, color='blue', linestyle='dashed', lw=2)
plt.ylim([0, 0.3])
plt.title("Cross-correlation")

Cross-correlation plot image

As you could find from the plot, I have a very special case with almost no correlation. Ideally, you should rewrite

plt.set_ylim([0, 0.3])

as

plt.set_ylim([0, 1]) 

to see a all correlation bounds. And, normaly, correlation of >=0.2 is considered to be statistically significant.

monkey
  • 19
  • 1
  • 5