3

I am using the example code posted here to show a progress_bar (from the progress package) with doParallel + foreach. Solutions there however make use of doSNOW (e.g. code by Dewey Brooke that I am using for testing), which is more outdated than doParallel and returns this NOTE when building a package with CRAN flags:

Uses the superseded package: ‘doSNOW (>= 1.0.19)’

Change this does not seems that as straightforward as expected. If only registerDoSNOW is replaced by registerDoParallel, and .options.snow by .options.doparallel the code will run, but in the second case will not show any progress bar at all.

I think this might relate to the use of .options.X. This part of the code is very obscure to me, since .options.snow works indeed when using doSNOW, but there is no documentation of the foreach man page about the use of this argument. Therefore, it is not surprising that .options.doparallel does not work, since it was just a wild guess of mine.

Including the call to pb$tick() within the foreach loop will not work either, and will actually cause the result to be wrong. So I am really out of ideas on where should I include this in the code.

Where .options.snow comes from? where should go pb$tick(), how to show the progress_bar object using doParallel here?

I paste below the code (doSNOW replaced by doParallel) for convenience, but credit again the original source:

library(parallel)
library(doParallel)

numCores<-detectCores()
cl <- makeCluster(numCores)
registerDoParallel(cl)

# progress bar ------------------------------------------------------------
library(progress)

iterations <- 100                               # used for the foreach loop  

pb <- progress_bar$new(
  format = "letter = :letter [:bar] :elapsed | eta: :eta",
  total = iterations,    # 100 
  width = 60)

progress_letter <- rep(LETTERS[1:10], 10)  # token reported in progress bar

# allowing progress bar to be used in foreach -----------------------------
progress <- function(n){
  pb$tick(tokens = list(letter = progress_letter[n]))
} 

opts <- list(progress = progress)

# foreach loop ------------------------------------------------------------
library(foreach)

foreach(i = 1:iterations, .combine = rbind, .options.doparallel = opts) %dopar% {
  summary(rnorm(1e6))[3]
}

stopCluster(cl) 
elcortegano
  • 2,444
  • 11
  • 40
  • 58

1 Answers1

2

doParallel still uses the .options.snow argument for whatever reason. Found this little tidbit in the doParallel documentation.

The doParallel backend supports both multicore and snow options passed through the foreach function. The supported multicore options are 1st preschedule, set.seed, silent, and cores, which are analogous to the similarly named arguments to mclapply, and are passed using the .options.multicore argument to foreach. The supported snow options are preschedule, which like its multicore analog can be used to chunk the tasks so that each worker gets a prescheduled chunk of tasks, and attachExportEnv, which can be used to attach the export environment in certain cases where R’s lexical scoping is unable to find a needed export. The snow options are passed to foreach using the .options.snow argument.

foreach is powerful package but whoever is maintaining it makes odd decisions.


EDIT

doParallel does not support the progress multicore option. Therefore, a progress bar will NOT display if registerDoParallel is used instead of registerDoSNOW.

While doSNOW has been superseded, it's unclear if one is more outdated than the other since both have undergone very few changes, either than updating the current Maintainer (doParallel | doSNOW).

doSNOW

doSNOW:::doSNOW <- function (obj, expr, envir, data) 
{
  cl <- data
  preschedule <- FALSE
  attachExportEnv <- FALSE
  progressWrapper <- function(...) NULL   # <- CRITICAL DIFFERENCE
  if (!inherits(obj, "foreach")) 
    stop("obj must be a foreach object")
  it <- iter(obj)
  accumulator <- makeAccum(it)
  options <- obj$options$snow
  if (!is.null(options)) {
    nms <- names(options)
    recog <- nms %in% c("preschedule", "attachExportEnv", 
                        "progress")      # <- CRITICAL DIFFERENCE
    if (any(!recog)) 
      warning(sprintf("ignoring unrecognized snow option(s): %s", 
                      paste(nms[!recog], collapse = ", ")), call. = FALSE)
...

doParallel


doParallel:::doParallelSNOW <- function (obj, expr, envir, data) 
{
  cl <- data
  preschedule <- FALSE
  attachExportEnv <- FALSE
  # MISSING: progressWrapper <- function(...) NULL
  if (!inherits(obj, "foreach")) 
    stop("obj must be a foreach object")
  it <- iter(obj)
  accumulator <- makeAccum(it)
  options <- obj$options$snow
  if (!is.null(options)) {
    nms <- names(options)
    recog <- nms %in% c("preschedule", "attachExportEnv" #MISSING , "progress")
    if (any(!recog)) 
      warning(sprintf("ignoring unrecognized snow option(s): %s", 
                      paste(nms[!recog], collapse = ", ")), call. = FALSE)
...

Dewey Brooke
  • 407
  • 4
  • 10
  • Almost a year later... Still couldn't figure out a way to go about this. I feel it should be possible given that `parallel` is based, among other things, on `snow`. Your answer seems to point at `foreach` for lacking something. Would you mind elaborating a bit? I would greatly appreciate it. – Mihai Oct 01 '22 at 18:18
  • 1
    @Mihai I updated my answer on why it won't work. Short answer./.. A progress bar will not display with `doParallel` as the backend no matter how hard you try. Just use `doSNOW`. – Dewey Brooke Oct 03 '22 at 02:46
  • 1
    This makes sense, thanks for updating. I was curious about a general solution (i.e., cross-platform and without relaying the implementation details of these parallel backends), and I came with the following: https://stackoverflow.com/a/73940644/5252007. Perhaps you may find it interesting to check out. – Mihai Oct 03 '22 at 20:46
  • 1
    @Mihai Thanks for the link!! I've seen similar solutions, yet I really wanted to exploit the flexibility of the `progress` package when I was looking for a solution. I'm irritated with the maintainers of the `foreach` package. After Microsoft acquired Revolution Analytics, development for all those packages came to a screeching halt. When I finally get free time, I might fork `doParallel` to add in the "progress" option. Would love collaboration if you're down. Won't be for awhile however. – Dewey Brooke Oct 04 '22 at 20:46
  • 1
    Happy that you've checked it out. You can, by the way, use that approach with the `progress` package as well. I would love to collaborate on what you proposed! When the time is right, just reply here or get in touch via the link in my profile. – Mihai Oct 05 '22 at 14:46
  • 1
    I eventually wrote a package to track progress for both `PSOCK` and `FORK` clusters. You can [find it here](https://parabar.mihaiconstantin.com/). I would love to hear your thoughts and/ or suggestions. – Mihai Feb 20 '23 at 04:51