0

I am running 1000 simulations in a loop. All simulations should take similar computational time because are all the same procedure, however my computational times are { 6.496, 7.680, 9.464 , 10.976, ..., 141.460, 145.276, 143.148}. They are badly increasing with time.

My guess is that has something to do with no space in the temporal memory or something like that, but I know very little about computer sciences. I would think that I need to just add an extra step whithin the loop where I delete the garbage that is using the memory (like a reset without deleting the previous calculations) and that should solve the problem with this unnecessary waste of time.

I appreciate a solution of this, but also a small explanation of why this happens in case you do not have a solution for R.

The code I am using is

 ptm <- proc.time()
 init_pars = c(0.8,0.0175,0.1)
 pars=init_pars
 n_it = 50
 M = matrix(nrow=n_it,ncol=3)
 for (i in 1:n_it){
   print(c(pars[1],pars[2],pars[3]))
   n_it = 10
   S=list()
   for (j in 1:n_it){
     rec_tree = reconst_tree(bt=s2$t,pars=pars,tt=15)
     S[[j]] = rec_tree
   }
   pars = mle_dd_setoftrees(S)
   pars = c(pars$lambda,pars$beta,pars$mu)
   M[i,]=c(pars[1],pars[2],pars[3])
   print(proc.time() - ptm)
   ptm <- proc.time()
 }

the function reconst_tree create independent simulations and mle_dd_setoftrees caluculate estimations from a set of simulations, then I store the estimations in the matrix M.

Francisco
  • 179
  • 11
  • Please post your current code or a minimum reproducible example. See here for more: stackoverflow.com/help/mcve. – jav Sep 05 '16 at 16:08
  • My guess is that you grow an object in the loop. – Roland Sep 05 '16 at 16:08
  • How are you storing your results? Did you allocate the whole vector/list or in every loop you are increasing the size of it, effectively copying the whole object? Also, please read http://stackoverflow.com/questions/2908822/speed-up-the-loop-operation-in-r – m-dz Sep 05 '16 at 16:08
  • Can you share the loop so we can provide an answer? Generally speaking loops in R are inefficient. You can speed this up by vectorizing using lapply, sapply etc., write the code in c++ using Rcpp, or use foreach and doMC to run your loop in parallel. – JackStat Sep 05 '16 at 16:08
  • 2
    @JackStat That's a common misconception. Loops in R are not inefficient. What you do inside the loops can be inefficient. And `lapply` just helps avoiding some of these inefficiencies, but it is not faster than a well-written `for` loop. – Roland Sep 05 '16 at 16:22
  • Inefficiency is Ok for now. What I do not understand is why in the beginning the simulations take few seconds and latter it increases. If those are all the same procedure every iteration should take similar computational time and not increase that much (by the way, I included the code). – Francisco Sep 05 '16 at 16:28
  • @Roland I microbenched it and unlist -> lapply is faster https://gist.github.com/JackStat/798fdab5b7ab186a216c358a403a8ba7 – JackStat Sep 05 '16 at 16:32
  • Error: could not find function "reconst_tree" – JackStat Sep 05 '16 at 16:34
  • Without a minimum reproducible example that shows the behavior (slow down) it is hard for us to help you. If you cannot post an runnable example the best way to diagnose your problem is profiling the code... – R Yoda Sep 05 '16 at 16:34
  • @JackStat Minimally, because setting up the results list is done in C code. Now try after doing `library(compiler); forLoop <- cmpfun(forLoop)` and compare with `sqrt(SS)`. – Roland Sep 05 '16 at 16:39
  • If you cannot provide us the `reconst_tree` function you profile your code yourself using this code: `Rprof( pfile <- "rprof.log", memory.profiling=TRUE) # your code runs here Rprof(NULL) summaryRprof(pfile,memory="both")` Then show us the result here please (output of the summaryRprof)... – R Yoda Sep 05 '16 at 16:40
  • Pre-allocate `S <- vector(mode = "list", length = n_it)` outside your nested loops. – Roland Sep 05 '16 at 16:44
  • @abhiieor No, they pre-allocate `M`. – Roland Sep 05 '16 at 16:45
  • @jackstat your example shows a difference between your lapply method and the for loop of two milliseconds. If you increase the size of the example, I wouldn't expect that to change much. In fact, with some ingenuity, it is possible to write for loops that are faster than apply. I would guess that the bulk of the speed difference between apply functions and properly constructed for loops is attributable to the interpreter and not the execution of the code. – Benjamin Sep 05 '16 at 16:51

1 Answers1

1

The offending part of your code is this:

 S=list()
   for (j in 1:n_it){
     rec_tree = reconst_tree(bt=s2$t,pars=pars,tt=15)
     S[[j]] = rec_tree
   }

What you are doing here is termed "growing an object"

One of the trade off for R's flexibility is that it spends a lot of time deciding how much memory to allocate to objects. Each time you add an element to your list, R is reevaluating the contents of each element, causing your loop to come to a horrific crawl in time.

A well constructed for loop can avoid this by allocating an appropriate container ahead of the loop.

josliber
  • 43,891
  • 12
  • 98
  • 133
Benjamin
  • 16,897
  • 6
  • 45
  • 65