3
library(parallel)
cl=makeCluster(4)
txts = c("I", "AM", "NOT", "PRINTED")
clusterApply(cl, txts, function(txt){write(txt, stderr())})
stopCluster(cl)
txts = c("WHILE", "I", "AM", "PRINTED")
lapply(txts, function(txt){write(txt,stderr())})

When the above code is run, calls to write from clusterApply seem to be ignored: nothing is printed.

The reason I want to print from clusterApply is that the code I'm going to run is expected to take many hours to complete; I want to be able to monitor progress.

I've found a surprising way to print from clusterApply; namely C++ code run through Rcpp fromclusterApply may print to console via std::cerr. Still, doing this seems overkillish enough.

Is there any other way to print from clusterApply?

2 Answers2

2

you can follow your progress by using makeCluster(4, outfile = ""). This also turns on the output of write(txt, stderr())

This solution outfile = "" seems just to work on linux systems. For further information of windows check the linked question and the commentaries. There seem to be some solutions like using Rterm instead of Rgui, but i can't provide it to you since i am not able to test it.

I used following code on xubuntu 18.04 and getting all calls.

library(parallel)
cl=makeCluster(4, outfile ="")
txts = c("I", "AM", "NOT", "PRINTED", seq(1,1000000,1))
clusterApply(cl, txts, function(txt){write(txt,stdout())})
stopCluster(cl) 

from the documentary of makeCluster:

outfile:

Where to direct the stdout and stderr connection output from the workers. "" indicates no redirection (which may only be useful for workers on the local machine). Defaults to ‘/dev/null’ (‘nul:’ on Windows). The other possibility is a file path on the worker's host. Files will be opened in append mode, as all workers log to the same file.

So if you want to use stderr, you have to clarify the outfile

mischva11
  • 2,811
  • 3
  • 18
  • 34
0

You need to capture the standard output produced in the background workers and return as part of the results and then re-output it in the main R process. The future framework does this automatically, as well as "relaying" and message and warning conditions too:

> library(future.apply)

> cl <- parallel::makeCluster(4)
> plan(cluster, workers = cl)

> txts <- c("I", "AM", "ALSO", "PRINTED")
> y <- future_lapply(txts, function(txt) {
+   print(txt)
+   message("M: ", txt)
+ })
[1] "I"
M: I
[1] "AM"
M: AM
[1] "ALSO"
M: ALSO
[1] "PRINTED"
M: PRINTED

> parallel::stopCluster(cl)

FYI, in the next release of the future package, output from workers will be relayed as soon as possible, i.e. as soon as the results are collected and available. In the current version, it will only be relayed when all workers are completed.

Additional comments:

You don't really want to output to stderr() explicitly - see https://github.com/HenrikBengtsson/Wishlist-for-R/issues/55 for one reason.

The approach of creating a PSOCK cluster with outfile = "" should be considered a hack. That output will end up in the "background", it cannot be captured anywhere by R, and whether it will be displayed or not depends heavily in what type of environment your run, i.e. it's behavior different if R runs in the terminal on Linux, on Windows, in Rgui on Windows, RStudio, etc.

HenrikB
  • 6,132
  • 31
  • 34