Monte Carlo simulation of correlation between two Brownian motion (continuous random walk)

Question

y <- cumsum(rnorm(100,0,1)) # random normal, with small (1.0) drift.
y.ts <- ts(y)
x <- cumsum(rnorm(100,0,1))
x
x.ts <- ts(x)
ts.plot(y.ts,ty= "l", x.ts) # plot the two random walks 


Regression.Q1 = lm(y~x) ; summary(lm2) 
summary(Regression.Q1)

t.test1 <- (summary(Regression.Q1)$coef[2,3]) # T-test computation 


y[t] = y[t-1] + epsilon[t]
epsilon[t] ~ N(0,1)
set.seed(1)
t=1000
epsilon=sample(c(-1,1), t, replace = 1) # Generate k random walks across time {0, 1, ... , T}


N=T=1e3
y=t(apply(matrix(sample(c(-1,1),N*T,rep=TRUE),ncol=T),1,cumsum))
y[1]<-0
for (i in 2:t) {
  y[i]<-y[i-1]+epsilon[i]
}

I need to:

Repeat the process 1,000 times (Monte Carlo simulations), namely build a loop around the previous program and each time save the t statistics. You will have a sequence of 1;000 t-tests : S = (t-test1, t-test2, ... , t-test1000). Count the number of time the absolute value of the 1,000 t-tests > 1.96, the critical value at a 5% significance level. If the series were I(0) you would have found roughly 5%. It won't be the case here (spurious regression).

What do I need to add to save the respective coefficients ?

Zheyuan Li · Accepted Answer · 2016-10-09T16:07:57.810

Your posted code related to y[t] = y[t-1] + epsilon[t] is not real working code, but I can see that you are trying to store all 1000 * 2 random walk. Actually there is no need to do this. We only care about t-score rather than what those realizations of random walk are.

For this kind of problem, where we aim to replicate a procedure a lot of times, it is handy to first write a function to execute such a procedure for a single time. You already had good working code for this; we just need to wrap it in a function (removing those unnecessary part like plot):

sim <- function () {
  y <- cumsum(rnorm(100,0,1))
  x <- cumsum(rnorm(100,0,1))
  coef(summary(lm(y ~ x)))[2,3]
  }

This function takes no input; it only returns the t-score for one experiment.

Now, we are going to repeat this 1000 times. We can write a for loop, but function replicate is easier (read ?replicate if necessary)

S <- replicate(1000, sim())

Note this will take some time, much slower than it should be for such a simple task, because both lm and summary.lm are slow. A much faster way will be shown later.

Now, S is vector with 1000 values, which is the "a sequence of 1000 t-tests" you want. To get "the number of time the absolute value of the 1,000 t-tests > 1.96", we can just use

sum(abs(S) > 1.96)
# [1] 756

The result 756 is just what I get; you will get something different as the simulation is random. But it will always be quite a large number as expected.

A faster version of sim:

fast_sim <- function () {
  y <- cumsum(rnorm(100,0,1))
  x <- cumsum(rnorm(100,0,1))
  y <- y - mean(y)
  x <- x - mean(x)
  xty <- crossprod(x,y)[1]
  xtx <- crossprod(x)[1]
  b <- xty / xtx
  sigma <- sqrt(sum((y - x * b) ^ 2) / 98)
  b * sqrt(xtx) * sigma
  }

This function computes simple linear regression without lm, and t-score without summary.lm.

S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 778

An alternative way is to use cor.test:

fast_sim2 <- function () {
  y <- cumsum(rnorm(100,0,1))
  x <- cumsum(rnorm(100,0,1))
  unname(cor.test(x, y)[[1]])
  }

S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 775

Let's have a benchmark:

system.time(replicate(1000, sim()))
#   user  system elapsed 
#  1.860   0.004   1.867 

system.time(replicate(1000, fast_sim()))
#   user  system elapsed 
#  0.088   0.000   0.090 

system.time(replicate(1000, fast_sim2()))
#   user  system elapsed 
#  0.308   0.004   0.312

cor.test is much faster than lm + summary.lm, but manual computation is even faster!

Monte Carlo simulation of correlation between two Brownian motion (continuous random walk)

1 Answers1

Linked