5

When mclapply(X, FUN) encounters errors for some of the values of X, the errors propagate to some (but not all) of the other values of X:

require(parallel)
test <- function(x) if(x == 3) stop() else x
mclapply(1:3, test, mc.cores = 2)

#[[1]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ...[cut]
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ... [cut]

#Warning message:
#In mclapply(1:3, test, mc.cores = 2) :
#  scheduled core 1 encountered error in user code, all values of the job will be affected

How can I stop this happening?

orizon
  • 3,159
  • 3
  • 25
  • 30

1 Answers1

12

The trick is to set mc.preschedule = FALSE

mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE)
#[[1]]
#[1] 1

#[[2]]
#[1] 2

#[[3]]
#[1] "Error in FUN(X[[nexti]], ...[cut]
#Warning message:
#In mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE) :
#  1 function calls resulted in an error

This works because by default mclapply seems to divide X into mc.cores groups and applies a vectorized version of FUN to each group. As a result if any member of the group yields an error, all values in that group will yield the same error (but values in other groups are unaffected).

Setting mc.preschedule = FALSE has adverse effects and may make it impossible to reproduce a sequence of pseudo-random numbers where the same job always receives the same number in the sequence, see ?mcparallel under the heading Random numbers.

orizon
  • 3,159
  • 3
  • 25
  • 30
  • 2
    I know that this is an old thread but I wanted to comment as it provided me with an answer that I could not find elsewhere. My task was to read in over 8,000 g-zipped .csv files using `fread()` from `data.table` and `mclapply()` from `parallel`. It makes me wonder why the default is `mc.preschedule = TRUE` when propagation of errors can be so destructive. Surely setting it to `FALSE` and then calling `str()` on the result to check which files are erroneous is preferable? – Seanosapien Jan 31 '19 at 19:28