9

I am interested in generating an array(or numpy Series) of length N that will exhibit specific autocorrelation at lag 1. Ideally, I want to specify the mean and variance, as well, and have the data drawn from (multi)normal distribution. But most importantly, I want to specify the autocorrelation. How do I do this with numpy, or scikit-learn?

Just to be explicit and precise, this is the autocorrelation I want to control:

numpy.corrcoef(x[0:len(x) - 1], x[1:])[0][1]
Baron Yugovich
  • 3,843
  • 12
  • 48
  • 76
  • What exactly do you mean by "specific autocorrelation"? Are there particular time lags you are interested in? Obviously any signal is guaranteed to be perfectly correlated with itself at zero lag. – ali_m Nov 24 '15 at 17:11
  • I am interested in lag=1, please see the edits of the original question. – Baron Yugovich Nov 24 '15 at 17:27
  • if the coefficient is near to zero, N isn't too big and you always do corrcoef(x[:-1],x[1:]) then you can probably generate a random array by brute force. More scientifically, I think the reverse fft should be able to generate an array with specific autocorrelation, but I've never done this or looked at how! – paddyg Nov 24 '15 at 19:29
  • Do you care about correlations past lag 1, i.e., can they be non-zero? – pjs Nov 24 '15 at 19:54
  • I don't care about them. – Baron Yugovich Nov 24 '15 at 20:09

1 Answers1

7

If you are interested only in the auto-correlation at lag one, you can generate an auto-regressive process of order one with the parameter equal to the desired auto-correlation; this property is mentioned on the Wikipedia page, but it's not hard to prove it.

Here is some sample code:

import numpy as np

def sample_signal(n_samples, corr, mu=0, sigma=1):
    assert 0 < corr < 1, "Auto-correlation must be between 0 and 1"

    # Find out the offset `c` and the std of the white noise `sigma_e`
    # that produce a signal with the desired mean and variance.
    # See https://en.wikipedia.org/wiki/Autoregressive_model
    # under section "Example: An AR(1) process".
    c = mu * (1 - corr)
    sigma_e = np.sqrt((sigma ** 2) * (1 - corr ** 2))

    # Sample the auto-regressive process.
    signal = [c + np.random.normal(0, sigma_e)]
    for _ in range(1, n_samples):
        signal.append(c + corr * signal[-1] + np.random.normal(0, sigma_e))

    return np.array(signal)

def compute_corr_lag_1(signal):
    return np.corrcoef(signal[:-1], signal[1:])[0][1]

# Examples.
print(compute_corr_lag_1(sample_signal(5000, 0.5)))
print(np.mean(sample_signal(5000, 0.5, mu=2)))
print(np.std(sample_signal(5000, 0.5, sigma=3)))

The parameter corr lets you set the desired auto-correlation at lag one and the optional parameters, mu and sigma, let you control the mean and standard deviation of the generated signal.

JejeBelfort
  • 1,593
  • 2
  • 18
  • 39
Dan Oneață
  • 968
  • 7
  • 14