2

I am running a simulation in R, in which the outputs should be stored in numeric vectors in a variable of the type list. However, I am wondering why when I preallocated the list with numeric vectors, the computational time remains the same instead of reducing. My code is similar to the following hypothetical cases in which I have to use nested loops and store the results in the list. Here is the code for the case without preallocation:

n_times <- 5000

my_list <- list()

Sys.time()
start_time <- Sys.time()
for( i in 1:n_times){
  
  for (j in 1:10){
    df <- data.frame(y = rnorm(n = 200, mean = sample.int(10,1), sd = 4),
                     x1 = rnorm(n = 200, mean = sample.int(10,1), sd = 1),
                     x2 = rnorm(n = 200, mean = sample.int(10,1), sd = 4))
    model <- lm(y ~ x1 + x2, data = df)
    
    my_list[[as.character(j)]][i] <-  summary(model)$r.squared
  }
}

end_time <- Sys.time()
end_time - start_time

and here is the code for the case with preallocation:

# number of times the simulation to be run
n_times <- 5000
# preallocating the list of length 10 with numeric vectors of length n_times
my_list <- replicate(10, vector("numeric", n_times), simplify = F)
names(my_list) <- as.character(1:10)

Sys.time()
start_time <- Sys.time()
for( i in 1:n_times){
  
  for (j in 1:10){
    df <- data.frame(y = rnorm(n = 200, mean = sample.int(10,1), sd = 4),
                     x1 = rnorm(n = 200, mean = sample.int(10,1), sd = 1),
                     x2 = rnorm(n = 200, mean = sample.int(10,1), sd = 4))
    model <- lm(y ~ x1 + x2, data = df)
    
    my_list[[as.character(j)]][i] <-  summary(model)$r.squared
  }
}

end_time <- Sys.time()
end_time - start_time
Abbas
  • 807
  • 7
  • 14
  • My best guess is that it's because your vectors are so small. R version 3.4 has this in its NEWS file: *"Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred."* – Gregor Thomas Jul 31 '22 at 16:36
  • My guess is that the automatic over-allocation is at least 10 elements, so there's not really any difference between pre-allocation and the over-allocation in this small case. If your vectors grew larger than the initial over-allocation, then I think you'd see a performance hit (though since they would be over-allocated again, not as large a hit as what you'd see if you were working with a version of R < 3.4.0) – Gregor Thomas Jul 31 '22 at 16:38
  • To pre-allocate the list improves the running time but your example is not the best. The list is created the first time through the outer loop, the next `n_times-1` times the list already exists. And since its a list with only 10 members, each of them one numeric value only, the gain is negligible, if measurable at all. – Rui Barradas Jul 31 '22 at 16:41
  • I didn't see any considerable improvement even when n_times increased to for example 10,000 times or more. I am using R 4.2.1 – Abbas Jul 31 '22 at 17:20
  • Yeah, the `n_times` isn't the big one. Make `n_times = 10` and make your inner loop `for(j in 1:5000)` and I bet you'll see a bigger difference. – Gregor Thomas Aug 01 '22 at 01:44
  • 1
    Almost all the time is taken by `lm`, `summary`, and the creation of the `data.frame`. The time spent reallocating a small list will be undetectable. – jblood94 Aug 01 '22 at 11:38

1 Answers1

1

I think preallocating a list with just 5000 * 10 elements doesn't take much time , after profiling you code most time goes to lm and data.farme creations , see below

enter image description here

Mohamed Desouky
  • 4,340
  • 2
  • 4
  • 19