Why is foreach and %dopar% (doParallel package) slower than a while loop for iteration?

Question

Using a while loop for the code below finishes in about 20 seconds. Using foreach and %dopar% without clusters is about 25 seconds and with clusters about 28 seconds.

I'm looking for clarification since I've read here on stackoverflow that small task can be slower with parallel-processing, but when I increase the +/- for the numbers in iproduct the parallel-processing is still slower.

Is this because;

I'm using iproduct and instead should use a different iterator or
The amount of data being iterated by iproduct needs to be much much bigger for parallelizing makes sense or
The amount of computation in the while loop is a small task so parallelizing won't ever make it faster.

Any help on getting my code to run faster would be great.

My final data won't be huge since I am only keep what makes it through the conditional if statement, but I want to iterate over more numbers than what I currently have for p1&2 and r0-2.

Here is the while loop code

start_time <- Sys.time()
p1<-2
p2<-2
r0<-25
r1<-4
r2<-0

TB<-c()
iter_count <- ihasNext(iproduct(ani1=(p1-2):(p1+2), ani2=(p2-2):(p2+2),
                      fd0=(r0-6):(r0+6), fd1=(r1-4):(r1+6), fd2=(r2):(r2+6)))

while( hasNext(iter_count) ) {
  ne <- nextElem(iter_count)
  aniprev <- sum(ne$ani1,ne$ani2)
  SRFD <- sum(ne$fd1,(2*ne$fd2))
  totalSRFD <- sum(ne$fd0,ne$fd1,ne$fd2)
  manhattan_dist<-sum(abs(p1-ne$ani1),abs(p2-ne$ani2),
                      abs(r0-ne$fd0),abs(r1-ne$fd1),abs(r2-ne$fd2))
  
  if(manhattan_dist <=5 & aniprev == SRFD & totalSRFD == 29)
  {ani<-cbind(ne$ani1, ne$ani2, ne$fd0,
               ne$fd1, ne$fd2, manhattan_dist)
  TB=rbind(TB,ani)}
}
nodup2 <- TB[!duplicated(t(apply(TB, 1, sort))), ]

end_time <- Sys.time()

end_time - start_time

And here is the foreach and %dopar% parallel equivalent

start_time <- Sys.time()
p1<-2
p2<-2
r0<-25
r1<-4
r2<-0

iter_count <- iproduct(ani1=(p1-2):(p1+2), ani2=(p2-2):(p2+2),
               fd0=(r0-6):(r0+6), fd1=(r1-4):(r1+6), fd2=(r2):(r2+6))

dc <- detectCores()-1
registerDoParallel(dc)

res <- foreach(i=iter_count, .combine=rbind) %dopar% {
aniprev <- sum(i$ani1,i$ani2)
SRFD <- sum(i$fd1,(2*i$fd2))
totalSRFD <- sum(i$fd0,i$fd1,i$fd2)
manhattan_dist<-sum(abs(p1-i$ani1),abs(p2-i$ani2),
                    abs(r0-i$fd0),abs(r1-i$fd1),abs(r2-i$fd2))

if(manhattan_dist <=5 & aniprev == SRFD & totalSRFD == 29)
{ani<-cbind(i$ani1, i$ani2, i$fd0,
            i$fd1, i$fd2, manhattan_dist)}
}
nodup2 <- res[!duplicated(t(apply(res, 1, sort))), ]

end_time <- Sys.time()

end_time - start_time

Option number 3 is correct. Foreach parallelization works best if you send long tasks to each separate process. You are sending many tiny calculations to every processor core, which causes a lot of overhead from the parallel backend. — JadaLovelace, Jul 09 '21 at 20:12
JadaLovelace, thank you! Do you think that using 'expand.grid' and 'isplitRow' to send chunks of data to be parallel processed as suggested in [this answer](https://stackoverflow.com/questions/31254476/combinatorial-iterator-like-expand-grid) would speed up the processing time? As I increase the original vector (p's and r's) I know it will start taking hours to days to get results. — Mark S, Jul 10 '21 at 00:23
yes, i'm pretty sure it will. i have never worked with iterator objects before, but i'll try to get an example running with the code you posted. From a technical perspective, it only makes sense to parallelize your data in as many chunks as you have cores (include hyperthreaded cores!). So if you have 4 cores available, the fastest would be to split your data in 4 chunks. Splitting into more chunks than you have cores available doesn't help. — JadaLovelace, Jul 10 '21 at 13:04
I realized that the answer I linked to in my comment above won't work. They split the data into chunks before **expand.grid** and doing that will not give all the combinations of my example. Using **expand.grid** instead of **iproduct/ihasNext** give me 25,025 results. Splitting up the r's (r0=19:25, r1=0:5, r2=0:3) gave 4,200 results and (r0=26:31, r1=6:10, r2=4:6) gave 2,250 results for a total of 6,450. The issue is **iproduct** gives results one at a time and **expand.grid** gives it all at once. If parallel processing could be useful in this situation I would need something in between. — Mark S, Jul 11 '21 at 01:43
I also realized that I would not have to write a function for apply since the function **subset** in R would give me the data with the conditions I set. Maybe putting **expand.grid** inside a function that splits the data would work. If not, I'll post another question. Thank you again, JadaLovelace. — Mark S, Jul 11 '21 at 01:48

Why is foreach and %dopar% (doParallel package) slower than a while loop for iteration?

0 Answers0