R mclapply vs foreach

Question

I use mclapply for all my "embarassingly parallel" computations. I find it clean and easy to use, and when arguments mc.cores = 1 and mc.preschedule = TRUE I can insert browser() in the function inside mclapply and debug line by line just like in regular R. This is a huge help in getting code to production quicker.

What does foreach offer that mclapply does not? Is there a reason I should consider writing foreach code going forward?

If I understand correctly, both can use the multicore approach to parallel computations (permitting forking) which I like to use for performance reasons.

I have seen foreach being used in various packages, and have read the basics of it, but frankly I don't find it as easy to use. I also am unable to figure out how to get the browser() to work in foreach function calls. (yes I have read this thread browser mode with foreach %dopar% but didn't help me to get the browser to work right).

one reason we sometimes use `foreach` rather than `parallel` is the simple fact that `mclapply` does not work by default under windows (and many users still use windows). Although I could perform OS detection, as you noticed it also requires functions to be implemented a bit differently. — FM Kerckhof, Jan 08 '18 at 15:12

score 2 · Answer 1 · answered Nov 01 '20 at 16:49

The problem is almost the same as described here: Understanding the differences between mclapply and parLapply in R .

The mclapply is creating clones of the master process for each worker processes (threads/cores) at the point that mclapply is called, reproducibility is guaranteed. Unfortunately, that isn't possible on Windows where in contrast to multicore there is always used the multisession parallelism by foreach or parLapply.

When using parLapply or foreach with %dopar%, you generally have to perform the following additional steps: Create a PSOCK cluster, Register the cluster if desired, Load necessary packages on the cluster workers, Export necessary data and functions to the global environment of the cluster workers.

That is why foreach has parameters like .packages and .export which enable us to distribute everything needed across sessions.

future package provided details of differences between mulicore and multisession processing https://cran.r-project.org/web/packages/future/vignettes/future-1-overview.html

score 0 · Answer 2 · answered Jul 18 '20 at 01:03

0

As Steve Weston (author of foreach) says here, using foreach with doParallel as backend you can initialize workers. This can be helpful for setting up a database connection more efficiently once per worker instead of once per task.

answered Jul 18 '20 at 01:03

lulions

448
4
10

R mclapply vs foreach

2 Answers2

Linked