0

I am working with R and my script is taking a very long time. I was thinking I can stop it and then start it again by changing my counters.

My code is this

  NC <- MLOA
  for (i in 1:313578){
   len_mods <- length(MLOA[[i]])
   for (j in 1:2090){
      for(k in 1:len_mods){
          temp_match <- matchv[j]
          temp_rep <- replacev[j]
          temp_mod <- MLOA[[i]][k]
          is_found <- match(temp_mod,temp_match, nomatch = 0, incomparables = 0)
             if(is_found[1] == 1) NC[[i]][k] <- temp_rep
             rm(temp_match,temp_rep,temp_mod)
         }
     }
 }

I am thinking that I can stop my script, then re-start it by checking what values of i,j and k are and changing the counts to start at their current values. So instead of counting "for (i in 1:313578)" if i is up to 100,000 I could do (i in 100000:313578).

I don't want to stop my script though before checking that my logic about restarting it is solid.

Thanks in anticipation

Kaspi83
  • 1
  • 2
  • See [this](http://stackoverflow.com/questions/4442518/general-suggestions-for-debugging-r) for more information. – Tim Biegeleisen May 19 '15 at 01:03
  • 3
    In a `for` loop, the indices `i`, `j`, `k` are actually stored in your environment. So if you stop the loop, you can find out what `i`, `j`, `k` are. – Alex May 19 '15 at 01:35
  • Thanks Alex, I realized that i can find out what they are. I really wanted to double check the logic of being able to pick up from where I left off by changing the script to run from those values on. Thanks Tim, that article links to some nice debugging stuff. – Kaspi83 May 19 '15 at 02:21
  • As long as we're giving speed advice (like the answer below): two operations in the innermost loop involve `j` only and so should not be done for each `k` -- move them up one loop. I'm talking about `temp_match` and `temp_mod`. – Frank May 19 '15 at 03:21

3 Answers3

1

I'm a bit confused what you are doing. Generally on this forum it is a good idea to greatly simplify your code, and only present the core of the problem in a very simple example. That withstanding, this might help. Put your for loop in a function whose parameters are the first elements of the sequence of numbers you loop over. For example:

myloop <- function(x,...){
 for (i in seq(x,313578,1)){
...

This way you can easily manipulate were your loop starts.

The more important question is, however, why are you using for loops in the first place? In R, for loops should be avoided at all costs. By vectorizing your code you can greatly increase its speed. I have realized speed increases of a factor of 500!

In general, the only reason you use a for loop in R is if current iterations of the for loop depend on previous iterations. If this is the case then you are likely bound to the slow for loop.

Depending on your computer skills, however, even for loops can be made faster in R. If you know C, or are willing to learn a bit, interfacing with C can dramatically increase the speed of your code.

An easier way to increase the speed of your code, which unfortunately will not yield the same speed up as interfacing with C, is using R's Byte Complier. Check out the cmpfun function.

One final thing on speeding up code: The following line of codetemp_match <- matchv[j] looks innocuous enough, however, this can really slow things down. This is because every time you assign matchv[j] to temp_match you make a copy of temp_match. That means that your computer needs to find some were to store this copy in RAM. R is smart, as you make more and more copies, it will clean up after you and throw away those copies you are no longer using with the garbage collect function. However finding places to store your copies as well as calling the garbage collect function take time. Read this if you want to learn more: http://adv-r.had.co.nz/memory.html.

Jacob H
  • 4,317
  • 2
  • 32
  • 39
0

You could also use while loops for your 3 loops to maintain a counter. In the following, you can stop the script at any time (and view the intermediate results) and restart by changing continue=TRUE or simply running the loop part of the script:

n <- 6
res <- array(NaN, dim=rep(n,3))

continue = FALSE
if(!continue){
  i <- 1
  j <- 1
  k <- 1
}


while(k <= n){
  while(j <= n){
    while(i <= n){
      res[i,j,k] <- as.numeric(paste0(i,j,k))
      Sys.sleep(0.1)
      i <- i+1
    }
    j <- j+1
    i <- 1
  }
  k <- k+1
  j <- 1
}  

i;j;k
res
Marc in the box
  • 11,769
  • 4
  • 47
  • 97
0

This is what I got to....

 for(i in 1:313578)
    {
     mp<-match(MLOA[[i]],matchv,nomatch = 0, incomparables=0)
     lgic<- which(as.logical(mp),arr.ind = FALSE, useNames = TRUE)
     NC[[i]][lgic]<-replacev[mp]}

Thanks to those who responded, Jacob H, you are right, I am definitely a newby with R, your response was useful. Frank - your pointers helped.

My solution probably still isn't an optimal one. All I wanted to do was a find and replace. Matchv was the vector in which I was searching for a match for each MLOA[i], with replacev being the vector of replacement information.

Kaspi83
  • 1
  • 2