4

I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:

Some initialisations and a function definition:

a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)

myfunction <- function(frame,columns){
athing = 0
   if(columns == 5){
   athing = 100
   }
   else{
   athing = 1000
   }
value[colums+1] = athing
return(value)}

The problematic for-loop looks like this:

columns = 6
for(i in 1:nrow(myframe){
   values <- myfunction(as.matrix(myframe[i,]), columns)
   values[columns+2] = i
   values[columns+3] = myframe[i,3]
   #more columns added with simple operations (i.e. sum)

   solution <- rbind(solution,values)
   #solution is a large matrix from outside the for-loop
}

The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB). I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.

myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])

I have not really come further than this, although I tried applying this very good introduction to parallel processing.

How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...

Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?

user3347232
  • 407
  • 1
  • 7
  • 16
  • What is `columns`? Do I understand well, the `value` is vector with 2 possible values: 100 and 1000? – Adii_ Nov 10 '14 at 12:44
  • Before the for-loop it is... `columns` is a is a changing number of columns of the `values-frame` and `solutions-martrix`. Depending on a specific input (in the actual script 10000+ columns are possible). `myfunction`is far more complex in the actual script. Still it is just a series of if-branches. Each `values-frame` is built by the for-loop and `myfunction` and rbinded to the `solutions` matrix. – user3347232 Nov 10 '14 at 12:58
  • Dyd you try instead `solution <- rbind(solution,values)` try `solution[i,] = values`? As I understand, you got already created `solution` matric, so there is no need to bind next rows. Changing existing row od NA's to `value` is more efficient. Perhaps that will do the job? – Adii_ Nov 10 '14 at 13:03
  • `solution`is already created but "not complete" as the for-loop on display here is inside another for-loop that relies on `solution`. – user3347232 Nov 10 '14 at 14:09

2 Answers2

10

The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like

my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
    # Call all necessary commands to create values
    my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
konvas
  • 14,126
  • 2
  • 40
  • 46
  • This is exactly what I was looking for! Thank you very much. It reduces the execution time on my machine from 40 mins to 2 mins with the lowest possible `columns`. BTW: Last thing I tried was `solution <- do.call('rbind',my.list)`. This obviously did not work. Thanks again! – user3347232 Nov 11 '14 at 14:44
  • For solution part , you may just use this. `do.call(Map, c(rbind, my.list))` – Azam Yahya Aug 04 '21 at 05:34
  • This is how to do it, but keep in mind base::rbind requires "fifty times more RAM and is more than 15 times slower than dplyr::bind_rows" [1] [1] [Difference between rbind() and bind_rows() in R](https://stackoverflow.com/questions/42887217/difference-between-rbind-and-bind-rows-in-r) – Marcus Lauritsen Jun 27 '22 at 11:03
-1

A bit to long for comment, so I put it here: If columns is known in advance:

    myfunction <- function(frame){
    athing = 0
       if(columns == 5){
       athing = 100
       }
       else{
       athing = 1000
       }
    value[colums+1] = athing
    return(value)}

    apply(myframe, 2, myfunction)

If columns is not given via environment, you can use:

apply(myframe, 2, myfunction, columns) with your original myfunction definition.

Adii_
  • 363
  • 1
  • 6
  • Sorry, but I do not understand how this would lead to the same result. ;) Where would my `solution-matrix`be in this case? Where are the columns added from the original for-loop? – user3347232 Nov 10 '14 at 13:41