-1

I have the following code, for 1 combination (Comb) it takes 2 minutes to run. I need to run it on 20,000 combinations. df_ncol= 200 and nrow = 10000. Any idea on how to improve the running time?

For each combination I am doing the following:

Copying the values of column J into a new column which will take the same name and some value attached to the name. Then, I perform a transformation on that new column on all rows (sometimes I the loop exit before, see the IF part in the code). Once the transformation is done, I am moving to the next column and doing the same. Once the table is populated with the double amount of columns there is another part that is not included in the code that saves only a summation from this final table, this summation run fairly quickly. After this, I am moving to the next combination and create another table and so on, till I get to the last combination value. The bottleneck is happening at the transformation stage, when I am running on the rows. I am fairly new to R and I believe I miss the knowledge on how to improve upon this stage.

system.time({
    for(f in 1:Comb){

        for(j in names(dfnew1)[4:df_ncol]){ 
            ar<-final[f,j]  
            dfnew1[[paste(j, 'a', ar,sep="_")]]<-dfnew1[[j]] 

            last=ind[[j]]
            index_num=index[j]+1

            for(i in index_num:nrow_){
                 dfnew1[[paste(j, 'a',ar, sep="_")]][i] <- dfnew1[[j]][i]+ ar * dfnew1[[paste(j,'a',ar,sep="_")]][i-1]
                 if (i>last & (dfnew1[[paste(j, 'a',ar, sep="_")]][i]<(0.05*dfnew1[[j]][last]))){i=nrow_}
            }
       }
    }
})
Dave2e
  • 22,192
  • 18
  • 42
  • 50
user4797853
  • 71
  • 1
  • 9
  • You didn't include any data so we can't run or test your code. Be sure to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when you ask a question. Better still to describe what you are trying to do rather than just show us how you decided to do it. – MrFlick Jul 28 '16 at 21:18
  • You have a fairly large data table, the constant loop and adding additional columns onto an existing data frame is slow. You may find it faster to preallocate a secondary table to the correct size and then update the columns as you calculate them. – Dave2e Jul 29 '16 at 01:46

1 Answers1

0

You could wrap your code in Rprof():

Rprof("myloop")

 ## YOUR CODE HERE

Rprof(NULL) # cancel
summaryRprof("myloop")$by.self

And find out what exactly is taking the most time.

emehex
  • 9,874
  • 10
  • 54
  • 100
  • I have received this output, I am not sure where is the issue, utilizing [[? self.time self.pct total.time total.pct "paste" 14.44 17.28 14.44 17.28 "[[<-.data.frame" 8.68 10.39 16.66 19.93 "[[.data.frame" 5.10 6.10 37.42 44.77 "[[" 4.84 5.79 42.26 50.56 "" 4.60 5.50 22.66 27.11 "%in%" 2.28 2.73 11.20 13.40 "[[<-" 1.36 1.63 18.02 21.56 – user4797853 Jul 28 '16 at 21:49
  • Newer versions of RStudio have inbuilt profiling tools. https://support.rstudio.com/hc/en-us/articles/218221837-Profiling-with-RStudio – Jonathan Carroll Jul 29 '16 at 04:17