1

I have implemented a function in python to compute the autocorrelation of a time series at a specific lag k. It is implemented under the assumption that some of the time series might not be stationary. However I'm finding that for some of these I am getting values greater than 1, specially on the last lags. So I guess I must be getting wrong some calculation.

I am implementing the following:

                       enter image description here

Where for the terms corresponding to the lagged series I'm computing the mean and standart deviation from lag k onwards.

I have implemented the following code in python, which computes the autocorrelation for a specific lag k:

def custom_autocorrelation(x, lag = 12):
    n = len(x)
    std = x.std()
    mu = x.mean() 
    autocov = 0
    mu_lag = x[lag:].mean() 
    std_lag = x[lag:].std() 
    for j in range(n-lag):
        autocov += (x[j] - mu)*(x[j+lag] - mu_lag)
    autocorr = autocov/(std*std_lag*(n-lag))
    return autocorr

As an example I'm trying with the following sequence, for k = 12, an getting a coefficient of 1.03:

np.array([20623., 11041.,  5686.,  2167.,  2375.,  2057.,  3141.,   504.,
         152.,  6562.,  8199., 15103., 16632.,  7190.,  6987.,  2652.,
        1949.,  2223.,  1703.,  2163.,  1850.,  6932.,  5932., 13124.,
       14846.,  7850.,  4526.,  1277.,  1036.,  1500.,  1648.,  1384.,
        1446.,  3477.,  6818., 12446.,  9734.])

Any help would be very appreciated!

Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100
yatu
  • 86,083
  • 12
  • 84
  • 139
  • Hi @AlexanderMcFarlane sorry if it does not seem too clear in some points. Let me know what doubts do you have about the problem formulation. – yatu Sep 07 '18 at 09:54
  • I think your problem is with the means and stdev: Do you have a source for the equation. I suspect your issue is a theoretical / conceptual one rather than coding – Alexander McFarlane Sep 07 '18 at 09:54
  • @AlexanderMcFarlane Yes I think so too. I have compared with some python libraries by computing the unnormalized autocorrelation (https://en.wikipedia.org/wiki/Autocorrelation#Signal_processing) and the results are the same. So it must have to do with this – yatu Sep 07 '18 at 09:56
  • Yes @AlexanderMcFarlane http://itfeature.com/time-series-analysis-and-forecasting/autocorrelation-time-series-data – yatu Sep 07 '18 at 09:57
  • that doesn't seem to be the same equation? – Alexander McFarlane Sep 07 '18 at 09:59
  • No, I just rewrote it but it is the same – yatu Sep 07 '18 at 10:12
  • ok I think I see your problem – Alexander McFarlane Sep 07 '18 at 10:38

1 Answers1

2

I think you have simply wrote the equation down incorrectly. The following parts

std = x.std()
mu = x.mean() 

are not in-line with the original paper. It seems that you require the

std = x[: n - lag].std()
mu = x[: n - lag].mean() 

fixing this gives

In [221]: custom_autocorrelation(a, 12)
Out[221]: 0.9569497673729846

I have also taken some ideas from a previous answer of mine to greatly speed up the calculation

def modified_acorr(ts, lag):
    """An autocorrelation estimation as per
    http://itfeature.com/time-series-analysis-and-forecasting/autocorrelation-time-series-data

    Args:
        ts (np.ndarray): series
        lag (int): the lag

    Returns:
        float: The autocorrelation
    """
    return (
        (ts[:ts.size - lag] - ts[:ts.size - lag].mean()) *
        (ts[lag:] - ts[lag:].mean())
    ).ravel().mean() / (ts[lag:].std() * ts[:ts.size - lag].std())

Comparing a regular autocorrelation function we get similar answers

In [197]: modified_acorr(a, 12)
Out[197]: 0.9569497673729849

In [218]: acorr(a, a.mean(), 12) / acorr(a, a.mean(), 0)  # normalisation
Out[218]: 0.9201920561073853
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100