0

How can I write a combine function for an R foreach() statement that uses itertools chunking such that I get the same result as using the R foreach() statement without itertools chunking?

I have an R foreach() statement that performs a calculation and returns a list of three lists. A simplified version that gives the desired outputs is below in the first code block - it uses a combine function that I found at Saving multiple outputs of foreach dopar loop.

Now I want to run this same code using chunking from itertools. I tried this two different ways (see second and third code blocks below), and neither produced the desired result. The problem is that instead of three_lists being composed of 3 lists of 10 lists, in both my attempts to incorporate itertools chunking, three_lists is composed of 3 lists of 2 lists (with the 2 lists of different lengths in the different attempts). I am guessing that the lists are of length 2 rather than 10 because num_cores on my computer is 2 - this suggests to me that my combine function might need to be changed to properly combine outputs when using itertools chunking. I'm having trouble figuring out how to change it though. How should I change the combine function?

Here is the foreach() statement that generates the desired result:

# set up
library(foreach)
library(doParallel)

# set parallel options
num_cores_total <- detectCores() 
num_cores <- num_cores_total - 2
cl <- makeCluster(spec= num_cores, type="PSOCK")
registerDoParallel(cl, cores = num_cores)

# create function that will separate out foreach output into list of three lists
comb <- function(x, ...) {
  lapply(seq_along(x),
         function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}

# foreach statement
three_lists <- foreach(i = 1:10, .inorder=TRUE, .combine='comb', .multicombine=TRUE, .init=list(list(), list(), list())) %dopar% {

  first_output <- i*1
  second_output <- i*10
  third_output <- i*100

  list(first_output, second_output, third_output)

}

first_output_list <- three_lists[[1]]
second_output_list <- three_lists[[2]]
third_output_list <- three_lists[[3]]

Here is my first (unsuccessful) attempt at incorporating itertools chunking into the code:

# set up
library(foreach)
library(itertools)
library(doParallel)

# set parallel options
num_cores_total <- detectCores() 
num_cores <- num_cores_total - 2
cl <- makeCluster(spec= num_cores, type="PSOCK")
registerDoParallel(cl, cores = num_cores)

# create function that will separate out foreach output into list of three lists
comb <- function(x, ...) {
  lapply(seq_along(x),
         function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}

# foreach statement
three_lists <- foreach(thisIter=isplitIndices(10, chunks=num_cores), .inorder=TRUE, .combine='comb', .multicombine=TRUE, .init=list(list(), list(), list())) %dopar% {

    first_output <- thisIter*1
    second_output <- thisIter*10
    third_output <- thisIter*100

    list(first_output, second_output, third_output)

}

first_output_list <- three_lists[[1]]
second_output_list <- three_lists[[2]]
third_output_list <- three_lists[[3]]


# stop cluster
stopCluster(cl)

And here is my second (unsuccessful) attempt at incorporating itertools chunking into the code:

# set up
library(foreach)
library(itertools)
library(doParallel)

# set parallel options
num_cores_total <- detectCores() 
num_cores <- num_cores_total - 2
cl <- makeCluster(spec= num_cores, type="PSOCK")
registerDoParallel(cl, cores = num_cores)

# create function that will separate out foreach output into list of three lists
comb <- function(x, ...) {
  lapply(seq_along(x),
         function(i) c(x[[i]], lapply(list(...), function(y) y[[i]])))
}

# foreach statement
three_lists <- foreach(thisIter=isplitIndices(10, chunks=num_cores), .inorder=TRUE, .combine='comb', .multicombine=TRUE, .init=list(list(), list(), list())) %dopar% {

  calc_function <- function(x){
    first_output <- x*1
    second_output <- x*10
    third_output <- x*100

    return(list(first_output, second_output, third_output))
  }

  sapply(thisIter, calc_function)  
}

first_output_list <- three_lists[[1]]
second_output_list <- three_lists[[2]]
third_output_list <- three_lists[[3]]

# stop cluster
stopCluster(cl)
  • Can you try specifying `.combine = c` in the `foreach` call? That should append your returned lists together. – Alexis Jun 05 '18 at 19:32
  • Hi Alexis, thanks for the reply. I tried your suggestion but it doesn't work - it appends all the results together into one giant list. What I want is to create a list of three lists (as is done by the comb function). This way first_output, second_output, and third_output are easy to separate out after the foreach call. – user2295466 Jun 05 '18 at 20:22

1 Answers1

0

The idea is that you can use .combine=c to append the lists returned in chunks (so that you don't get nested lists), and then adjust the structure in the way you were doing without itertools (but simplified a bit):

lists <- foreach(thisIter=isplitIndices(10L, chunks=num_cores), .combine=c) %dopar% {
    lapply(thisIter, function(i) {
        c(i * 1L, 
          i * 10L,
          i * 100L)
    })
}

first_output_list <- lapply(lists, "[", 1L)
second_output_list <- lapply(lists, "[", 2L)
third_output_list <- lapply(lists, "[", 3L)
Alexis
  • 4,950
  • 1
  • 18
  • 37