Below is just a toy sample to represent the issue for purposes of reproducible code, but my data and subsequent functions acting on the data are much more involved and would actually benefit from running in parallel.
The problem I have is that the loop below runs as expected both under %do% and %dopar%, but %dopar% is very slow relative to %do%.
I have narrowed down the problem to the fact that I am searching through a very large list, grabbing the data from that list by indexing to subset and then doing stuff to it.
Can someone offer insight into how the %dopar% for could be improved? In my actual data, I need to subset a data frame already stored in a list and then that df is passed to 4 different functions.
And also apologies, I did post this question on R-Help, but see more activity regarding foreach on Stack Exchange.
N <- 200000
myList <- vector('list', N)
names(myList) <- 1:N
for(i in 1:N){
myList[[i]] <- rnorm(100)
}
nms <- 1:N
library(foreach)
library(doParallel)
registerDoParallel(cores=7)
result <- foreach(i = 1:3) %do% {
dat <- myList[[which(names(myList) == nms[i])]]
mean(dat)
}
result <- foreach(i = 1:3) %dopar% {
dat <- myList[[which(names(myList) == nms[i])]]
mean(dat)
}