8

I am trying to write a CRAN package with multithreaded capabilities. I achieved a perfect solution with doSNOW, but the package has been flagged as "superseded" by the CRAN team and they asked me to switch to a doParallel solution. This is fine, however I could not find a way to monitor the number of jobs completed using doParallel in the same way I did with doSNOW. Here is my doSNOW solution:

# Set up parameters
nthreads<-2
nreps<-100
funrep<-function(i){
    Sys.sleep(0.1)
    res<-c(log2(i),log10(i))
    return(res)
}
# doSNOW solution
library(doSNOW)
cl<-makeCluster(nthreads)
registerDoSNOW(cl)
pb<-txtProgressBar(0,nreps,style=3)
progress<-function(n){
    setTxtProgressBar(pb,n)
}
opts<-list(progress=progress)
i<-0
output<-foreach(i=icount(nreps),.combine=c,.options.snow=opts) %dopar% {
    s<-funrep(i)
    return(s)
}
close(pb)
stopCluster(cl)

And here's a doParallel solution as suggested in a previous Stack Overflow post. However, as you can see, it doesn't print progress as the jobs are done, it just does it when the results are combined, at the very end.

# doParallel solution
library(doParallel)
progcombine<-function(){
  count<-0
  function(...) {
    count<<-count+length(list(...))
    setTxtProgressBar(pb,count)
    utils::flush.console()
    c(...)
  }
}
cl <- makeCluster(nthreads)
registerDoParallel(cl)
output<-foreach(i = icount(nreps),.combine=progcombine()) %dopar% {
    funrep(i)
}
stopCluster(cl)

Can you suggest me a solution to monitor job status completion using doParallel or at least without using the superseded doSNOW? Possibly with a progress bar and also possibly with multi-OS capability. Thanks a lot!

Federico Giorgi
  • 10,495
  • 9
  • 42
  • 56
  • Have you consider [log4r](https://cran.r-project.org/web/packages/log4r/index.html) - can log both to file & console at the same time. Then log the done job so you can easily following the progress. – Sinh Nguyen Oct 20 '19 at 14:06
  • Have you looked at the `pbapply` package? I don't know a lot about this but it's my go to package for parallel + progress bar tasks. – JBGruber Oct 20 '19 at 14:10
  • 2
    Actually, when I run your code I see a progress bar incrementing at the completion of each task (more or less). Do you have the option `.multicombine = TRUE`? I do not, and it is not set to TRUE by default. – Martin Morgan Oct 20 '19 at 14:49
  • Have you found any solution to this? – Mihai Oct 02 '22 at 09:53
  • I eventually settled for a *valiant*, but cross-platform solution: https://stackoverflow.com/a/73940644/5252007. Also, in my package, I am using a permanent session instead of a background process. – Mihai Oct 03 '22 at 21:11

3 Answers3

7

(disclaim: I'm the author of the progressr package and the future framework)

The progressr package can achieve this when using doFuture as a parallel backend to foreach:

library(progressr) ## use progressr for procession updates
library(doFuture)  ## attaches also foreach and future
registerDoFuture() ## tell foreach to use futures
plan(multisession) ## parallelize over a local PSOCK cluster

xs <- 1:5

with_progress({
  p <- progressor(along = xs) ## create a 5-step progressor
  y <- foreach(x = xs) %dopar% {
    p()                       ## signal a progression update
    Sys.sleep(6.0-x)
    sqrt(x)
  }
})

The default is to use utils::txtProgressBar() for progression reporting, but you can change this. For example, the following will make progression updates being reported via progress::progress_bar() and beepr::beep():

progressr::handlers("progress", "beepr")

You can also add messages for each progression update, e.g.

p(sprintf("x=%g", x))

FYI, plan(multisession, workers = 2) is short for plan(cluster, workers = cl) where cl is basically cl <- parallel::makeCluster(2L).

PS. The objective of the progressr package is to provide a minimal, sustainable, extendable, and unified API for progression updates. This, while being invariant to what iterator framework is used.

PPS. The progressr API is under development; it might take a while before it has identified its true self.

HenrikB
  • 6,132
  • 31
  • 34
  • Honestly: well done! For my package, I will switch to your progressr once it goes on CRAN or Bioconductor. I will switch to it immediately for testing :-) – Federico Giorgi Oct 20 '19 at 21:40
  • 1
    Tried to fix this with edit but edit queue was full: progressr::handler does not exist (anymore?). progressr::handlers("progress", "beepr") seems to do the trick. – Doctor G Jul 13 '22 at 06:53
  • Thx. I've fixed the typo; it should be `handlers()` - `handler()` never existed. – HenrikB Jul 13 '22 at 08:21
4

I could not find a solution with doParallel (I don't think it supports progress bars for job completion), but maybe you can try the new package pbapply:

# pbapply solution
library(pbapply)
cl<-parallel::makeCluster(nthreads)
invisible(parallel::clusterExport(cl=cl,varlist=c("nreps")))
invisible(parallel::clusterEvalQ(cl=cl,library(utils)))
result<-pblapply(cl=cl,
                 X=1:nreps,
                 FUN=funrep)
parallel::stopCluster(cl)
elcortegano
  • 2,444
  • 11
  • 40
  • 58
Lupo
  • 56
  • 2
2

A solution using surveillance::plapply. The handling of the progress bar is implemented in the function to be parallelized.

funrep2 <- function(i){
  i <<- i + 1
  setTxtProgressBar(pb, i)
  res <- c(log2(i), log10(i))
  Sys.sleep(0.1)
  return(res)
}

After exporting objects to the clusters the handling is similar to your parallel solution so far:

library(parallel)
pb <- txtProgressBar(max=nreps, style=3)
cl <- makeCluster(nthreads)
clusterExport(cl, c("pb", "funrep2"), envir=environment())
clusterEvalQ(cl, library(surveillance))
i <- 0
output <- surveillance::plapply(1:nreps, "funrep2")
stopCluster(cl)
close(pb)

# |=============================================================================...     |  85%
jay.sf
  • 60,139
  • 8
  • 53
  • 110