0

I have nested for loops as below. I need p:q=1:300, and n=20. Function "mark" is the model of my interest(Package RMark). I know rbind can be slow but I have no idea what should be used to replace it. Otherwise what else I can do to make this function faster? Thanks.

foo<-function(data, p, q, n){
results.frame <- data.frame()
for (i in 1:n){
    for (i in p:q) {
        run.model<-mark(data[sample(nrow(data), i),], model="Occupancy")       
        results<-data.frame(summary(run.model)$real$p, Occupancy=summary(run.model)$real$Psi, se.p=t(as.matrix(summary(run.model, se=T)$real$p$se)), se.Psi=summary(run.model, se=T)$real$Psi$se, stations=i)
        results.frame<-rbind(results.frame, results)
        } 
    }
write.table(results.frame, "C:\\RWorkspace\\simulation_results.txt")
return(results.frame)
}
lamushidi
  • 303
  • 3
  • 5
  • 14
  • 1) you should preallocate the dimensions of `results.frame` and then fill it in by indexing. 2) Do you really need a `data.frame()` or will a `matrix()` suffice?. All of your results look like they will be numerics, so a matrix may suffice. 3) `cmpfun()` your function via the `compiler` package to see if that gets you any free speed bumps. 4) Tell us where function `mark()` comes from as it is not in base and your question is not [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) so people will find it difficult to offer real advice. – Chase Sep 01 '12 at 01:20
  • 1) Sounds reasonable, I'll try that. 2) Yes, all my outputs are numeric so I'll switch to matrix(). 3) I'll take a look at `cmpfun()` and `complier`. 4) function `mark()` comes from package "RMark", it's for animal population dynamics. Thank you for your suggestions. – lamushidi Sep 01 '12 at 01:29
  • 1
    Also, you're using `i` in both loops; better to use different variables so you're sure you're using the `i` you want inside the loop. – Aaron left Stack Overflow Sep 01 '12 at 01:46

1 Answers1

0

Yes, rbind can be slow; the faster thing is usually to make the matrix the right size to start with and fill it in appropriately. It's also usually faster to fill in a matrix instead of a data frame.

However, with the size you indicate, I would suspect that mark is what is slowing the function down and you won't get much noticeable speedup by doing that. It would be easy to test that by storing a single result in run.model and then commenting that line out of your loop; that will tell you how much time it's spending just storing the results. (You could also "profile" the function, but this would be simpler.)

EDIT: I'm actually wrong; the size you indicate is big enough that the rbind is quite possible causing problems. On my system, which is fairly fast and has a decent amount of memory, it takes 7.73 sec to rbind using data frames with n=20 and only 0.09 sec with n=1, so clearly some memory churning is happening. As for speedup, with n=20 it takes only 1.00 sec to rbind matrices and 0.033 sec to fill it in.

foo <- function(data, p, q, n){
  # make a single results line; remove this line when you put in your code 
  results <- c(1, Occupancy=2, se.p=3, se.Psi=4, stations=5)
  # make the matrix the right size to start with
  results.frame <- matrix(ncol=5, nrow=(q-p+1)*n)
  for (i in 1:n){
    for (j in p:q) {
      # get results here; commented out to show loop speed only
      # put in your actual code here instead
      results.frame[ 1+(i-1)*(q-p+1)+(j-p), ] <- results
    } 
  }
  # get the names right by taking the names from the last time through the loop
  colnames(results.frame) <- names(results)
  results.frame
}
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • Thank you for your advice. I first tried `p:q=1:200` without `n` and it took only 8 min to finish. Then I added another for loop for `n` and run `p:q=1:300, n=20`, then it took nearly 5 hours to finish. So I thought it might be the nested loop rather than `mark()`. What do you mean by `commenting that line out of my loop`? I apologize first if it's a dumb question. I'm not a professional programmer and still learning. – lamushidi Sep 01 '12 at 01:36
  • Can't wait to try the code you modified. But I have to wait until my current simulation ends. T_T – lamushidi Sep 01 '12 at 03:07