2

I'm trying to parallelize a for loop that I have. There is a nested loop inside the loop in question that I'd like to parallelize. The answer is bound to be very similar to: nested foreach loops in R to update common array, but I can't seem to get it to work. I've tried all the options I can think of, including just turning the inner loop into its own function and parallelizing that, but I keep getting empty lists back.

The first, non-foreach example works:

theFrame <- data.frame(col1=rnorm(100), col2=rnorm(100))

theVector <- 2:30

regFor <- function(dataFrame, aVector, iterations)
{   
    #set up a blank results matrix to save into.
    results <- matrix(nrow=iterations, ncol=length(aVector))

    for(i in 1:iterations)
    {
        #set up a blank road map to fill with 1s according to desired parameters
        roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
        row.names(roadMap) <- aVector
        colnames(roadMap) <- 1:dim(dataFrame)[1]

        for(j in 1:length(aVector))
        {
            #sample some of the 0s and convert to 1s according to desired number of sample
            roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
        }

        temp <- apply(roadMap, 1, sum)

        results[i,] <- temp
    }

    results <- as.data.frame(results)
    names(results) <- aVector

    results
}

test <- regFor(theFrame, theVector, 2)

But this and my other similar attempts do not work.

trying <- function(dataFrame, aVector, iterations, cores)
{   
    registerDoMC(cores)

    #set up a blank results list to save into. i doubt i need to do this
    results <- list()

    foreach(i = 1:iterations, .combine="rbind") %dopar%
    {
        #set up a blank road map to fill with 1s according to desired parameters
        roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
        row.names(roadMap) <- aVector
        colnames(roadMap) <- 1:dim(dataFrame)[1]

        foreach(j = 1:length(aVector)) %do%
        {
            #sample some of the 0s and convert to 1s according to desired number of sample
            roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
        }

        results[[i]] <- apply(roadMap, 1, sum)
    }
    results
}

test2 <- trying(theFrame, theVector, 2, 2)

I take it that I have to use foreach on the inner loop no matter what, right?

Community
  • 1
  • 1
forlooper
  • 237
  • 4
  • 11

3 Answers3

4

When using foreach, you never "set up a blank results list to save into", as you suspected. Instead, you combine the results of evaluating the body of the foreach loop, and that combined result is returned. In this case, we want the outer foreach loop to combine vectors (computed by the inner foreach loop) row-wise into a matrix. That matrix is assigned to the variable results, which is then converted to a data frame.

Here's my first attempt at converting your example:

library(doMC)

foreachVersion <- function(dataFrame, aVector, iterations, cores) {
  registerDoMC(cores) # unusual, but reasonable with doMC
  rows <- nrow(dataFrame)
  cols <- length(aVector)
  results <-
    foreach(i=1:iterations, .combine='rbind') %dopar% {
      # The value of the inner foreach loop is returned as
      # the value of the body of the outer foreach loop
      foreach(aElem=aVector, .combine='c') %do% {
        roadMapRow <- double(length=rows)
        roadMapRow[sample(rows,aElem)] <- 1
        sum(roadMapRow)
      }     
    }
  results <- as.data.frame(results)
  names(results) <- aVector
  results
}

The inner loop doesn't need to be implemented as a foreach loop. You could also use sapply, but I'd try to figure out if there's a faster method. But for this answer, I wanted to demonstrate a foreach method. The only real optimization that I used was to get rid of the call to apply by executing sum inside the inner foreach loop.

Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Thank you! This works well. In case it wasn't clear, my actual code is doing more complicated stuff inside both loops, so using sapply, etc, could be pretty tricky. There were two main problems with my original formulation. The first was setting up a blank list and trying to save results into it (incidentally I have done that successfully with foreach in other circumstances), and the second was not grasping the meaning of what .combine is doing. This is cool and produces results without having to use things like Reduce(), which I've done in the past with foreach. Thanks! – forlooper Feb 25 '15 at 21:00
  • @forlooper It is possible to have side effects in foreach loops, but it works differently depending on which backend you use, so it's strongly discouraged. I originally tried to prevent the possibility of side effects in foreach, but I eventually gave up trying. – Steve Weston Feb 25 '15 at 22:02
  • Thanks Steve. I think I should re-program some stuff then! – forlooper Feb 27 '15 at 19:05
3

You need to put the result of foreach in a variable:

    results<- foreach( ...
cmbarbu
  • 4,354
  • 25
  • 45
1
  • I know this is an outdate question, but just to give a hint for those who do not get nested foreach to work.
  • If parallelizing outer loop with foreach()%dopar%{foreach()%do%{}}, you would need to include .packages = c("doSNOW") in the augment of the outer loop, otherwise you will run into "doSNOW not found" error.
  • Generally, people just parallelize inner loop (foreach()%:%foreach()%dopar%{}, as also suggested on the forum), which can be much slower for a huge amount of data (waiting for combinations of every 100 results and also at the end of every inner loops, and this process is not parallel!).
Jin
  • 57
  • 9