speed up a monte carlo simulation with nested loop in R

Question

I would like to speed up the below monte carlo simulation of a DEA estimate

A<-nrow(banks)
effm<-matrix(nrow=A, ncol=2)
m<-20
B<-100

pb <- txtProgressBar(min = 0,
                     max = A, style=3)
for(a in 1:A) {
  x1<-x[-a,]
  y1<-y[-a,]
  theta=matrix(nrow=B,ncol=1) 

  for(i in 1:B){

    xrefm<-x1[sample(1:nrow(x1),m,replace=TRUE),]
    yrefm<-y1[sample(1:nrow(y1),m,replace=TRUE),]
    theta[i,]<-dea(matrix(x[a,],ncol=3),
                   matrix(y[a,],ncol=3),
                   RTS='vrs',ORIENTATION='graph',
                   xrefm,yrefm,FAST=TRUE)
  }

  effm[a,1]=mean(theta)
  effm[a,2]=apply(theta,2,sd)/sqrt(B)
  setTxtProgressBar(pb, a) 
}
close(pb)
effm

Once A becomes large the simulation freezes. i am aware from online research that the apply function rapidly speeds up such code but am not sure how to use it in the above procedure.

Any help/direction would be much appreciated

Barry

There's a lot of misinformation online. The `apply` function may or may not be faster than a for loop; it depends on what you're doing. You need to profile your code for speed to see what portions are slowest (see `?Rprof`), then you will know what needs to be faster. People could help profile your code if you provide a [reproducible example](http://stackoverflow.com/q/5963269/271616). — Joshua Ulrich, Nov 29 '12 at 15:51
@JoshuaUlrich ditto! also, if you can post portions of the data you're using, we will be able to actually run your code which makes it much easier to help — Justin, Nov 29 '12 at 15:52
Can you define "freeze" ? There's a big difference between a process that takes a long time, and one which blows out system memory (or something) and hangs up the process and/or the entire OS. — Carl Witthoft, Nov 29 '12 at 16:18
Would be helpful if we could run this code locally. What is `banks`? — Roman Luštrik, Nov 30 '12 at 10:58

John · Accepted Answer · 2012-11-29T19:27:37.577

1

The following should be faster.... but if you're locking up when A is large that might be a memory issue and the following is more memory intensive. More information, like what banks is, what x is, y, where you get dea from, and what the purpose is would be helpful.

Essentially all I've done is try to move as much as I can out of the inner loop. The shorter that is, the better off you'll be.

A <- nrow(banks)
effm <- matrix(nrow = A, ncol = 2)
m <- 20
B <- 100
pb <- txtProgressBar(min = 0,
                     max = A, style=3)
for(a in 1:A) {
  x1 <- x[-a,]
  y1 <- y[-a,]
  theta <- numeric(B)
  xrefm <- x1[sample(1:nrow(x1), m * B, replace=TRUE),] # get all of your samples at once
  yrefm <- y1[sample(1:nrow(y1), m * B, replace=TRUE),]
  deaX <- matrix(x[a,], ncol=3)
  deaY <- matrix(y[a,], ncol=3)

  for(i in 1:B){
    theta[i] <- dea(deaX, deaY, RTS = 'vrs', ORIENTATION = 'graph',
                   xrefm[(1:m) + (i-1) * m,], yrefm[(1:m) + (i-1) * m,], FAST=TRUE)
  }

  effm[a,1] <- mean(theta)
  effm[a,2] <- sd(theta) / sqrt(B)
  setTxtProgressBar(pb, a) 
}
close(pb)
effm

edited Nov 29 '12 at 19:27

answered Nov 29 '12 at 15:55

John

23,360
7
57
83

It would be faster still if you create the first two arguments to `dea` before the second loop. – Joshua Ulrich Nov 29 '12 at 16:04
For that matter, `xynrow<-nrow(x1);mB<-m*B` and rewriting as `xrefm <- x1[sample(1:xynrow, mB, replace=TRUE),]` will remove a massive number of calculations from your task. – Carl Witthoft Nov 29 '12 at 16:23
@John I guess dea is from benchmarking package, but can you explain in 2 sentences your idea to perform the code? – agstudy Nov 29 '12 at 16:38
Thanks for all useful comments so far and apologies for some of the vagueness in my post.My code is running the order M hyperbolic efficiency estimation, where – barryq Nov 29 '12 at 17:25
Thanks for all useful comments/code suggestions so far and apologies for some of the vagueness in my post. Banks is dataframe of 1394 bank observations, x is a 1394 x 3 matrix of inputs and y is 1394 matrix of outputs, while DEA is from benchmarking. I am trying to run an order M hyperbolic efficiency estimation where the unit (each bank in this case) being analysed is left out of the reference set in the DEA estimation. – barryq Nov 29 '12 at 17:33
What i mean by freezing is that the code hangs in R but does effect the OS processing power. – barryq Nov 29 '12 at 17:49
I will attempt to great a reproducible copy of the process and hopefully this will make it easier to provide advice... Should i then create a new question or add to this stream as an extra comment ? – barryq Nov 29 '12 at 17:51
@john thanks alot for you code modification but unfortunately it produces an error "Error in dea(deaArg1, deaArg2, RTS = "vrs", ORIENTATION = "graph", xrefm[(1:m) + : Number of inputs must be the same in X and XREF" – barryq Nov 29 '12 at 18:11
2

hard to debug without x... You can delete comments you cannot edit. You should be putting lots of this in your question and deleting comments. – John Nov 29 '12 at 18:39
@john yes works now but R still hangs at about 31% of the process – barryq Nov 30 '12 at 06:22

speed up a monte carlo simulation with nested loop in R

1 Answers1