1

Suppose I have the following simple for loops for a simulation and OLS estimation:

set.seed (12345)
m <- rnorm(20, 0, 1)
n <- 10
b1 <- 0.5
b2 <- 2
model1_b <- matrix(nrow=n, ncol=2)
model2_b <- matrix(nrow=n, ncol=2)
error <- matrix(nrow=20, ncol=n)

for (a in 1:2){
 for (b in 1:4){

   x <- (m+a)/b

   for (r in 1:10){
     repeat {

       e <- rnorm(20, 0, 0.5) # the error term
       error[,r] <- e

       # OLS estimation of Model_1
       y=b1 + b2*x + e # the true model 1
       Model_1 <- lm (y~x)
       model1_b[r,]=Model_1$coef

       # OLS estimation of Model_2
       y=b1 + b2*(x^2) + e # the true model 2
       Model_2 <- lm (y~x)
       model2_b[r,]=Model_2$coef

       if (Model_1$coef[1]!=0 & Model_2$coef[1]!=0) {break}

     } # end of repeat{} loop
    } # end of for(r){} loop
   } # end of for(b){} loop
  } # end of for(a){} loop

  error
  model1_b
  model2_b

I want to convert these nested for loops into nested foreach loops, such that I could do parallel computing. As you can see that, data generated within the loops, such as error, model1_b, model2_b, are saved one by one in matrices that I defined before running the loop. My question is: how can I save these results in foreach loops? no matter in a list, data frame or matrix.

(Note: My actual model is much more complex, and the first (outer) and second loops are relatively small in size. But the third (inner) loop is quite large. Maybe I don't need nested foreach loops, and it's OK to parallelize only the inner loop. Really appreciate if you guys can teach me how to save the results on both situations (using three nested foreach loops and using foreach only on the inner loop).

tshepang
  • 12,111
  • 21
  • 91
  • 136
Chen
  • 111
  • 1
  • 17
  • 2
    It's usually best to parallelize the *outer* loop, since you want the tasks that get sent to the workers to be work intensive in comparison to parallelization overhead. If the outer loop has more iterations than you have CPUs, you don't need nested `foreach` loops. Otherwise I can only point you to the [vignette](http://cran.r-project.org/web/packages/foreach/vignettes/nested.pdf). Btw., in your example, you shouldn't use `lm`, but the underlying function `lsfit`, which will be much faster. Also, comparing floating point numbers like in your example is stupid. – Roland Sep 22 '14 at 07:21
  • @Roland Thank you for your comment. I actually have only 4 iterations in the outer loop, 8 in the second loop and around 400 in the inner loop. It takes me about 5 hours to get the 400 inner loop done using maximum likelihood estimation in my real model. Therefore the total execution time would be around 160 hours (4*8*5=160). What is your suggestion? to use nested foreach loops or just parallelize the outer loop? I plan to run the code on a server. – Chen Sep 22 '14 at 17:21
  • @Roland and one more thing. I don't quite understand your last sentence "comparing floating point numbers like in your example is stupid". What is floating point numbers? The random errors? Can you specify this? Thanks a lot. – Chen Sep 22 '14 at 17:59
  • 1
    http://stackoverflow.com/a/9508558/1412059 – Roland Sep 23 '14 at 15:00
  • @Roland that's brilliant! first time knowing this since I am not a computer guy... But still another problem, how can I store the results (error, model1_b and model2_b) if I apply foreach loop only on the inner loop? – Chen Sep 23 '14 at 17:40

1 Answers1

0

concerning store results(error, model1_b and model2_b) in a foreach loop, the answer can be found at Saving multiple outputs of foreach dopar loop

and %dorng% might be used instead of using %dopar%.

Community
  • 1
  • 1
Chen
  • 111
  • 1
  • 17