83

I have a foreach loop that uses %dopar% with doSNOW as the back-end. How can I have the loop print something out each iteration?

My code below is what I'm currently using, but its not printing anything.

foreach(ntree=rep(25,2),.combine=combine,.packages='randomForest',
    .inorder=FALSE) %dopar% {
        print("RANDOM FOREST")
        randomForest(classForm,data=data,na.action=na.action,do.trace=do.trace,ntree=ntree,mtry=mtry)
    }   
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
n8schloss
  • 2,723
  • 2
  • 19
  • 27
  • 2
    ah, but it is printing, just not on the master node... – Joshua Ulrich Jun 05 '12 at 19:53
  • 1
    oh, okay. In that case is there a way that I can view what its printing or have it print to the master node? – n8schloss Jun 05 '12 at 19:54
  • I don't know of any and I'm not sure how it could be done. – Joshua Ulrich Jun 05 '12 at 23:04
  • Would it be possible to write an iterator function that prints to the console? The iterator should be run by the master, no? – Noam Ross May 06 '13 at 18:56
  • @NoamRoss Yes, the iterator runs only on the master, so it could write to the console. That would allow you to monitor tasks being sent to the workers, rather than when the tasks are actually executed by the workers. – Steve Weston Aug 09 '13 at 15:57

6 Answers6

66

Output produced by the snow workers gets thrown away by default, but you can use the makeCluster "outfile" option to change that. Setting outfile to the empty string ("") will prevent snow from redirecting the output, often resulting in the output from your print messages showing up on the terminal of the master process.

Just create and register your cluster with something like:

library(doSNOW)
cl <- makeCluster(4, outfile="")
registerDoSNOW(cl)

Your foreach loop doesn't need to change at all.

This works for me with both SOCK clusters and MPI clusters using Rmpi built with Open MPI. On Windows, you won't see any output if you're using Rgui. If you use Rterm.exe instead, you will.

Note that in addition to your own output, you'll see messages produced by snow which can also be useful.


To use a progress bar, doSNOW version 1.0.14 has a progress option. Here is a complete example:

library(doSNOW)
library(tcltk)
library(randomForest)
cl <- makeSOCKcluster(3)
registerDoSNOW(cl)

ntasks <- 100
pb <- tkProgressBar(max=ntasks)
progress <- function(n) setTkProgressBar(pb, n)
opts <- list(progress=progress)

x <- matrix(runif(500), 100)
y <- gl(2, 50)

rf <- foreach(ntree=rep(25, ntasks), .combine=combine,
        .multicombine=TRUE, .packages='randomForest',
        .options.snow=opts) %dopar% {
  randomForest(x, y, ntree=ntree)
}

The progress option is fairly general, so you could simply print a message using a function such as:

progress <- function(n) cat(sprintf("task %d is complete\n", n))

The function can take 0, 1, or 2 arguments. The first supplied argument is the total number of completed tasks, and the second is the task number of the task that just finished.

The simplest example simply prints a . when a task completes:

progress <- function() cat('.')

This example displays both arguments and can be used to demonstrate that tasks aren't always completed in order:

progress <- function(nfin, tag) {
  cat(sprintf('tasks completed: %d; tag: %d\n', nfin, tag))
}
Steve Weston
  • 19,197
  • 4
  • 59
  • 75
  • Is there an equivalent solutions for directing the output of print messages to the console with `doMC`? – Matt SM Dec 21 '15 at 19:39
  • @MattSM With doMC, output from the workers is displayed on the console by default when using standard R. You may run into problems when using RStudio, but I don't think that doMC is recommended when using RStudio due to problems with forking. Also, doMC doesn't support a progress bar due to restrictions in the parallel package. – Steve Weston Dec 21 '15 at 20:32
  • @SteveWeston I like the progress bar solution. But there are times when I want to see snippets of the result in each iteration. Is there any way to do the equivalent of cat("result at iter i") ? – horaceT Feb 01 '18 at 20:31
  • @horaceT If you're using a parallel backend that calls the combine function on the fly (such as doSNOW or doMPI), then you can easily display result snippets from the combine function. But keep in mind that the combine function is only called for every 100th result unless you decrease the value of `.maxcombine`. – Steve Weston Feb 02 '18 at 15:39
  • @SteveWeston First time it happens to me : when i google `.options.snow`, i see absolutely no documentation, except this SO thread of yours. Can you explain what's going on under the hood. Where is `progress` getting `n` from? – horaceT Feb 02 '18 at 17:48
  • @horaceT I thought I wrote some more, but apparently this is it. The function `doSNOW` in the file "doSNOW.R" in the doSNOW source distribution sends tasks to the workers and collects the results. If a progress function is specified, it will call that function when a result arrives with up to two arguments. See my updated answer for more information. – Steve Weston Feb 05 '18 at 19:52
  • @SteveWeston Very useful. One last Q, what is, and how to use the 3rd argument in `prorgress` ? – horaceT Feb 08 '18 at 18:50
  • @horaceT There is no third argument. The function takes a maximum of 2 arguments. – Steve Weston Feb 13 '18 at 22:41
  • The `cl <- makeCluster(3, outfile="")` tip works with `doParallel::registerDoParallel(cl)` too. – allanvc May 06 '20 at 04:28
29

There are a number of good solutions posted here, but I find it easiest to log to a socket and use a separate process to output the log calls in a console.

I use the following function:

log.socket <- make.socket(port=4000)

Log <- function(text, ...) {
  msg <- sprintf(paste0(as.character(Sys.time()), ": ", text, "\n"), ...)
  cat(msg)
  write.socket(log.socket, msg)
}

You can then place log statements in the code such as:

Log("Processing block %d of %d", i, n.blocks)

Log output can viewed in real-time using any simple socket listening tool. For example, using netcat on Linux:

nc -l 4000

The above log statement would display in the netcat terminal as:

2014-06-25 12:30:45: Processing block 2 of 13

This method has the advantage of working remotely and provides as detailed output as you care to log.

p.s. For those on Windows, see Jon Craton's netcat port.

p.p.s I'm guessing the write.socket R function probably isn't thread-safe, but unless you're logging at high frequency, you're unlikely to run into any issue. Something to be aware of though.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
hendalst
  • 2,957
  • 1
  • 24
  • 25
  • 3
    Need to run `nc -l 4000` in linux terminal *before* executing `log.sock = make.socket(port=4000)` in `R` – horaceT Feb 09 '18 at 19:14
  • OK, But when you have like 10 different process, how can you record them properly? Establish a nc server and R socket for each R process? Or just make the R socket in different process connected to the same nc server? – cloudscomputes Nov 12 '19 at 09:43
12

A way I've kept track of progress on nodes during long operations is to create a progress bar using tkProgressBar from the tcltk package. It's not quite what you asked for, but it should let you see something from the nodes. At least it does when the cluster is a socket cluster running on the local host (which is a Windows machine). The potential problem is that the progress bar either remains and clutters your monitor or it gets closed and the printed info is gone. For me, that wasn't a problem, though, since I just wanted to know what the current status was.

library(parallel)
library(doSNOW)
cl<-makeCluster(detectCores(),type="SOCK")
registerDoSNOW(cl)

Using your code,

foreach(ntree=rep(25,2),.combine=combine,.packages=c('randomForest','tcltk'),
    .inorder=FALSE) %dopar% {
        mypb <- tkProgressBar(title = "R progress bar", label = "",
          min = 0, max = 1, initial = 0, width = 300)
        setTkProgressBar(mypb, 1, title = "RANDOM FOREST", label = NULL)
    ans <- randomForest(classForm,data=data,na.action=na.action,do.trace=do.trace,ntree=ntree,mtry=mtry)
    close(mypb)
    ans
    }

Here's a more general use example:

jSeq <- seq_len(30)

foreach(i = seq_len(2), .packages = c('tcltk', 'foreach')) %dopar% {
    mypb <- tkProgressBar(title = "R progress bar", label = "",
        min = 0, max = max(jSeq), initial = 0, width = 300)
    foreach(j = jSeq) %do% {
        Sys.sleep(.1)
        setTkProgressBar(mypb, j, title = "RANDOM FOREST", label = NULL)
    }
    NULL
}
BenBarnes
  • 19,114
  • 6
  • 56
  • 74
  • Hi, I tried using the progress bar but can't make it work using parallel for loops. Everytime I run it I get "XIO: fatal IO error". How do you go around this? @BenBarnes – user88595 Mar 17 '14 at 13:26
  • @TAllieri, please see the updated example (the original example wasn't very illustrative). If you are still having problems, please provide more information and consider asking a new question. – BenBarnes Mar 17 '14 at 18:13
11

I had the same issue too. I was tuning parameters for random forest using foreach package and wanted to print a "Results" line after each iteration, but couldn't figure out without going through displaying a progress bar and such.

Here's what I did, In my function, I added this line

write.table(result, file=paste("RF_ntree_",ntree,"_dims_",dims,".txt", sep=""),
  sep="\t", row.names=F)

So after every iteration, the results are written to a text file with names such as RF_ntree_250_dims_100.txt.

So if I want to keep track of the progress I just refresh the folder to which the text files are being written to.

PS: the results are being accumulated in a data frame too.

BenBarnes
  • 19,114
  • 6
  • 56
  • 74
3

cat("blah-blah-blah\n", file=stdout()) tends to work for me (linux/emacs/ess). I guess it also works for some other platforms.

Ott Toomet
  • 1,894
  • 15
  • 25
  • Oddly this logs if you tail the `outfile` that you specified on `makeCluster`. Which makes me wonder what print is doing differently than this. – Sid Apr 02 '18 at 04:46
  • I didn't see any print out in the rstudio console, it may print to the outfile set in the makeCluster,though – cloudscomputes Nov 12 '19 at 07:02
0

An alternative is to use file logging (for instance, log4r package) and separately print the output on the screen (for instance, by 'tail -f').

This works well if you consider creating logs anyway, and you can use the existing packages with all the related bells and whistles.

Ott Toomet
  • 1,894
  • 15
  • 25