0

I generated a series of 10,000 random numbers through:

rand_x = rf(10000, 3, 5)

Now I want to produce another series that contains the variances at each point i.e. the column look like this:

[variance(first two numbers)]
[variance(first three numbers)]
[variance(first four numbers)]
[variance(first five numbers)]
.
.
.
.
[variance of 10,000 numbers]

I have written the code as:

c ( var(rand_x[1:1]) : var(rand_x[1:10000])

but I am only getting 157 elements in the column rather than not 10,000. Can someone guide what I am doing wrong here?

2 Answers2

0

An option is to loop over the index from 2 to 10000 in sapply, extract the elements of 'rand_x' from position 1 to the looped index, apply the var and return a vector of variance output

out <- sapply(2:10000, function(i) var(rand_x[1:i]))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

Your code creates a sequence incrementing by one with the variance of the first two elements as start value and the variance of the whole vector as limit.

var(rand_x[1:2]):var(rand_x[1:n])
# [1] 0.9026262 1.9026262 2.9026262

## compare:
.9026262:3.33433
# [1] 0.9026262 1.9026262 2.9026262

What you want is to loop over the vector indices, using seq_along to get the variances of sequences growing by one. To see what needs to be done, I show you first a (rather slow) for loop.

vars <- numeric()  ## initialize numeric vector
for (i in seq_along(rand_x)) {
  vars[i] <- var(rand_x[1:i])
}
vars
#  [1]        NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
#  [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296

The first element has to be NA because the variance of one element is not defined (division by zero).

However, the for loop is slow. Since R is vectorized we rather want to use a function from the *apply family, e.g. vapply, which is much faster. In vapply we initialize with numeric(1) (or just 0) because the result of each iteration is of length one.

vars <- vapply(seq_along(rand_x), function(i) var(rand_x[1:i]), numeric(1))
vars
#  [1]        NA 0.9026262 1.4786540 1.2771584 1.7877717 1.6095619
#  [7] 1.4483273 1.5653797 1.8121144 1.6192175 1.4821020 3.5005254
# [13] 3.3771453 3.1723564 2.9464537 2.7620001 2.7086317 2.5757641
# [19] 2.4330738 2.4073546 2.4242747 2.3149455 2.3192964 2.2544765
# [25] 3.1333738 3.0343781 3.0354998 2.9230927 2.8226541 2.7258979
# [31] 2.6775278 2.6651541 2.5995346 3.1333880 3.0487177 3.0392603
# [37] 3.0483917 4.0446074 4.0463367 4.0465158 3.9473870 3.8537925
# [43] 3.8461463 3.7848464 3.7505158 3.7048694 3.6953796 3.6605357
# [49] 3.6720684 3.6580296

Data:

n <- 50
set.seed(42)
rand_x <- rf(n, 3, 5)
jay.sf
  • 60,139
  • 8
  • 53
  • 110