Your posted code related to y[t] = y[t-1] + epsilon[t]
is not real working code, but I can see that you are trying to store all 1000 * 2 random walk. Actually there is no need to do this. We only care about t-score rather than what those realizations of random walk are.
For this kind of problem, where we aim to replicate a procedure a lot of times, it is handy to first write a function to execute such a procedure for a single time. You already had good working code for this; we just need to wrap it in a function (removing those unnecessary part like plot
):
sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
coef(summary(lm(y ~ x)))[2,3]
}
This function takes no input; it only returns the t-score for one experiment.
Now, we are going to repeat this 1000 times. We can write a for
loop, but function replicate
is easier (read ?replicate
if necessary)
S <- replicate(1000, sim())
Note this will take some time, much slower than it should be for such a simple task, because both lm
and summary.lm
are slow. A much faster way will be shown later.
Now, S
is vector with 1000 values, which is the "a sequence of 1000 t-tests" you want. To get "the number of time the absolute value of the 1,000 t-tests > 1.96", we can just use
sum(abs(S) > 1.96)
# [1] 756
The result 756 is just what I get; you will get something different as the simulation is random. But it will always be quite a large number as expected.
A faster version of sim
:
fast_sim <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
y <- y - mean(y)
x <- x - mean(x)
xty <- crossprod(x,y)[1]
xtx <- crossprod(x)[1]
b <- xty / xtx
sigma <- sqrt(sum((y - x * b) ^ 2) / 98)
b * sqrt(xtx) * sigma
}
This function computes simple linear regression without lm
, and t-score without summary.lm
.
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 778
An alternative way is to use cor.test
:
fast_sim2 <- function () {
y <- cumsum(rnorm(100,0,1))
x <- cumsum(rnorm(100,0,1))
unname(cor.test(x, y)[[1]])
}
S <- replicate(1000, fast_sim())
sum(abs(S) > 1.96)
# [1] 775
Let's have a benchmark:
system.time(replicate(1000, sim()))
# user system elapsed
# 1.860 0.004 1.867
system.time(replicate(1000, fast_sim()))
# user system elapsed
# 0.088 0.000 0.090
system.time(replicate(1000, fast_sim2()))
# user system elapsed
# 0.308 0.004 0.312
cor.test
is much faster than lm
+ summary.lm
, but manual computation is even faster!