1

I'm using doParallel to do fairly long parallel processing with foreach. Rather than most examples I see, where a computationally-intensive but input-light code is fed into the loop, I'm using foreach to coordinate the simultaneous processing of a number of large, independent datasets. So inside the loop, I'm using metadata to read in a file from disk, operate on it, and write back out.

Before I turned this operation into a foreach loop, I was writing out debug messages using messages(). However, since I've switched to using foreach and %dopar%, I've noticed that the loop 'goes dark': it's doing what it ought to, but I'm not receiving any output. (I should mention that this loop is written into a script that I'm calling from the shell with Rscript.)

I'm guessing that this has something to do with the fact that doParallel spins off other threads—maybe those threads no longer know where to dump standard output? Thoughts?

jimjamslam
  • 1,988
  • 1
  • 18
  • 32
  • 1
    I'm not a genius of parallel computing, but it's definitely true that socket-type clusters in R don't return outputs (e.g. progress bars, messages, etc) until the job finishes and returns the output. I've never worked with fork-type clusters, so I don't know if that would circumvent this limitation or not. I've been desperate for a progress bar a few times in the past, and there is a work-around when the number of parallel processes is low: write separate, non-parallelized code for each job and run each job by hand in a separate (simultaneous) instance of R. – Jacob Socolar Jul 13 '17 at 06:45
  • @JacobSocolar Oof, that _is_ desperate ;) I ran this non-inreractively via a PBS and found that my logs had error and warning messages from the shell (part of this processing involves using `system()` to call other tools) but not `message()` output in R. So it seems like there's probably it. I suppose another desperate answer is to `system("echo My update")... – jimjamslam Jul 13 '17 at 06:57

1 Answers1

1

If you want to output from a parallel-foreach loop, just use the option outfile: makeCluster(no_cores, outfile = "/path/to/log_file.txt").

Note the the logs of all workers are written to the same file (in the order in which they arrive).

mayeulk
  • 100
  • 8
F. Privé
  • 11,423
  • 2
  • 27
  • 78