6

In order to see the console messages output by a function running in a foreach() loop I followed the advice of this guy and added a sink() call like so:

   library(foreach)    
   library(doMC)
   cores <- detectCores()
   registerDoMC(cores)

   X <- foreach(i=1:100) %dopar%{
   sink("./out/log.branchpies.txt", append=TRUE)
   cat(paste("\n","Starting iteration",i,"\n"), append=TRUE)
   myFunction(data, argument1="foo", argument2="bar")
   }

However, at iteration 77 I got the error 'sink stack is full'. There are well-answered questions about avoiding this error when using for-loops, but not foreach. What's the best way to write the otherwise-hidden foreach output to a file?

Community
  • 1
  • 1
Roger
  • 677
  • 2
  • 8
  • 19
  • Are you actually running this in parallel? Why are you using `sink` *and* `cat` with a file? – Roland Oct 10 '14 at 09:45
  • I am running the same computationally-intensive function on 100 elements of a list in parallel using `foreach` because it would take forever using a `for` loop, or even `mclapply` (I've tried and it's much slower). I'm using `sink` and `cat` because the linked page recommended I do, and because it helps keep track of which iteration the `foreach` loop is up to. – Roger Oct 10 '14 at 09:49
  • You didn't answer the question. You don't show how you set up the cluster. Also, the tutorial you link to doesn't use the `file` argument of `cat`. – Roland Oct 10 '14 at 09:52
  • `mclapply` shouldn't be slower than `foreach` if you set up the cluster correctly. – Roland Oct 10 '14 at 09:52
  • Sorry, I didn't actually use the file argument in `cat`—that was something I was experimenting with. I mistyped the code. I'll fix it now. – Roger Oct 10 '14 at 09:54
  • @Roland he may be running this on Windows, where `mclapply` doesn't do anything. – Hong Ooi Oct 10 '14 at 11:04
  • @Hong @Roland I'm using a Mac. `mclapply` resulted in a definite speed increase relative to a `for`-loop but it was meagre compared to `foreach`. – Roger Oct 10 '14 at 11:05

4 Answers4

7

This runs without errors on my Mac:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  sink("log.branchpies.txt", append=TRUE)
  cat(paste("\n","Starting iteration",i,"\n"))
  sink() #end diversion of output
  rnorm(i*1e4)
}

This is better:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)
sink("log.branchpies.txt", append=TRUE)
X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"))
    rnorm(i*1e4)
}
sink() #end diversion of output

This works too:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  cat(paste("\n","Starting iteration",i,"\n"), 
       file="log.branchpies.txt", append=TRUE)
  rnorm(i*1e4)
}
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thanks. The problem with my original code must have been that it didn't include sink() to end the diversion of output. – Roger Oct 12 '14 at 14:07
4

As suggested by this guy , it is quite tricky to keep track of the sink stack. It is, therefore advised to use ability of cat to write to file, such as suggested in the answer above:

cat(..., file="log.txt", append=TRUE)

To save some typing you could create a wrapper function that diverts output to file every time cat is called:

catf <- function(..., file="log.txt", append=TRUE){
  cat(..., file=file, append=append)
}

So that at the end, when you call foreach you would use something like this:

library(foreach)    
library(doMC)
cores <- detectCores()
registerDoMC(cores)

X <- foreach(i=1:100) %dopar%{
  catf(paste("\n","Starting iteration",i,"\n"))
  rnorm(i*1e4)
}

Hope it helps!

Community
  • 1
  • 1
dmi3kno
  • 2,943
  • 17
  • 31
1

Unfortunately, none of the abovementioned approaches worked for me: With sink() within the foreach()-loop, it did not stop to throw the "sink stack is full"-error. With sink() outside the loop, the file was created, but never updated.

To me, the easiest way of creating a log-file to keep track of a parallelised foreach()-loop's progress is by applying the good old write.table()-function.

    library(foreach)
    library(doParallel)

    availableClusters <- makeCluster(detectCores() - 1) #use all cpu-threads but one (i.e. one is reserved for the OS)
    registerDoParallel(availableClusters) #register the available cores for the parallisation

    x <- foreach (i = 1 to 100) %dopar% {
           log.text <- paste0(Sys.time(), " processing loop run ", i, "/100")
           write.table(log.text, "loop-log.txt", append = TRUE, row.names = FALSE, col.names = FALSE)

           #your statements here
    }

And don't forget (as I did several times...) to use append = TRUE within write.table().

ChristianB
  • 81
  • 6
0

Call sink() with no arguments once inside the for loop to reset it to end the file writing at the end of each iteration and you will not get this error again.

Abhimanu Kumar
  • 1,751
  • 18
  • 20
  • Doesn't work for me, I suspect because each worker is breaking out of the loop beforehand, so this maybe works contingent on the loop reaching it. – dez93_2000 Feb 07 '20 at 17:42