Does preallocation of R list improve loop run time? how?

Question

I am running a simulation in R, in which the outputs should be stored in numeric vectors in a variable of the type list. However, I am wondering why when I preallocated the list with numeric vectors, the computational time remains the same instead of reducing. My code is similar to the following hypothetical cases in which I have to use nested loops and store the results in the list. Here is the code for the case without preallocation:

n_times <- 5000

my_list <- list()

Sys.time()
start_time <- Sys.time()
for( i in 1:n_times){
  
  for (j in 1:10){
    df <- data.frame(y = rnorm(n = 200, mean = sample.int(10,1), sd = 4),
                     x1 = rnorm(n = 200, mean = sample.int(10,1), sd = 1),
                     x2 = rnorm(n = 200, mean = sample.int(10,1), sd = 4))
    model <- lm(y ~ x1 + x2, data = df)
    
    my_list[[as.character(j)]][i] <-  summary(model)$r.squared
  }
}

end_time <- Sys.time()
end_time - start_time

and here is the code for the case with preallocation:

# number of times the simulation to be run
n_times <- 5000
# preallocating the list of length 10 with numeric vectors of length n_times
my_list <- replicate(10, vector("numeric", n_times), simplify = F)
names(my_list) <- as.character(1:10)

Sys.time()
start_time <- Sys.time()
for( i in 1:n_times){
  
  for (j in 1:10){
    df <- data.frame(y = rnorm(n = 200, mean = sample.int(10,1), sd = 4),
                     x1 = rnorm(n = 200, mean = sample.int(10,1), sd = 1),
                     x2 = rnorm(n = 200, mean = sample.int(10,1), sd = 4))
    model <- lm(y ~ x1 + x2, data = df)
    
    my_list[[as.character(j)]][i] <-  summary(model)$r.squared
  }
}

end_time <- Sys.time()
end_time - start_time

My best guess is that it's because your vectors are so small. R version 3.4 has this in its NEWS file: *"Assigning to an element of a vector beyond the current length now over-allocates by a small fraction. The new vector is marked internally as growable, and the true length of the new vector is stored in the truelength field. This makes building up a vector result by assigning to the next element beyond the current length more efficient, though pre-allocating is still preferred."* — Gregor Thomas, Jul 31 '22 at 16:36
My guess is that the automatic over-allocation is at least 10 elements, so there's not really any difference between pre-allocation and the over-allocation in this small case. If your vectors grew larger than the initial over-allocation, then I think you'd see a performance hit (though since they would be over-allocated again, not as large a hit as what you'd see if you were working with a version of R < 3.4.0) — Gregor Thomas, Jul 31 '22 at 16:38
To pre-allocate the list improves the running time but your example is not the best. The list is created the first time through the outer loop, the next `n_times-1` times the list already exists. And since its a list with only 10 members, each of them one numeric value only, the gain is negligible, if measurable at all. — Rui Barradas, Jul 31 '22 at 16:41
I didn't see any considerable improvement even when n_times increased to for example 10,000 times or more. I am using R 4.2.1 — Abbas, Jul 31 '22 at 17:20
Yeah, the `n_times` isn't the big one. Make `n_times = 10` and make your inner loop `for(j in 1:5000)` and I bet you'll see a bigger difference. — Gregor Thomas, Aug 01 '22 at 01:44
Almost all the time is taken by `lm`, `summary`, and the creation of the `data.frame`. The time spent reallocating a small list will be undetectable. — jblood94, Aug 01 '22 at 11:38

score 1 · Answer 1 · answered Jul 31 '22 at 16:35

1

I think preallocating a list with just 5000 * 10 elements doesn't take much time , after profiling you code most time goes to lm and data.farme creations , see below

answered Jul 31 '22 at 16:35

Mohamed Desouky

4,340
2
4
19

Does preallocation of R list improve loop run time? how?

1 Answers1