15

I'm testing out the parLapplyLB() function to understand what it does to balance a load. But I'm not seeing any balancing happening. For example,

cl <- parallel::makeCluster(2)

system.time(
  parallel::parLapplyLB(cl, 1:4, function(y) {
    if (y == 1) {
      Sys.sleep(3)
    } else {
      Sys.sleep(0.5)
    }}))
##   user  system elapsed 
##  0.004   0.009   3.511 

parallel::stopCluster(cl)

If it was truly balancing the load, the first job (job 1) that sleeps for 3 seconds would be on the first node and the other three jobs (jobs 2:4) would sleep for a total of 1.5 seconds on the other node. In total, the system time should only be 3 seconds.

Instead, I think that jobs 1 and 2 are given to node 1 and jobs 3 and 4 are given to node 2. This results in the total time being 3 + 0.5 = 3.5 seconds. If we run the same code above with parLapply() instead of parLapplyLB(), we get the same system time of about 3.5 seconds.

What am I not understanding or doing wrong?

josiekre
  • 795
  • 1
  • 7
  • 19
  • I think R doesn’t do automatic load balancing. I think it divides the *tasks* across as many cores as available, regardless of the time it takes to do each task, or when each task completes. It's not like there is a queue of tasks, and when one worker finished it grabs the next one. Each core was assigned two tasks. Hence 3 + 0.5 on the first worker, and a total of 3.5. *would be happy to be wrong* – gregmacfarlane Jul 06 '16 at 18:11
  • 4
    Yes that's where the 3.5 is coming from. It's not balancing the load. But the parLapplyLB claims to balance. – josiekre Jul 06 '16 at 18:18

2 Answers2

14

NOTE: Since R-3.5.0, the behavior/bug noted by the OP and explained below has been fixed. As noted in R's NEWS file at the time:

* parLapplyLB and parSapplyLB have been fixed to do load balancing
  (dynamic scheduling).  This also means that results of
  computations depending on random number generators will now
  really be non-reproducible, as documented.

ORIGINAL ANSWER (Now only relevant for R versions < 3.5.0 )

For a task like yours (and, for that matter, for any task for which I've ever needed parallel) parLapplyLB isn't really the right tool for the job. To see why not, have a look at the way that it's implemented:

parLapplyLB
# function (cl = NULL, X, fun, ...) 
# {
#     cl <- defaultCluster(cl)
#     do.call(c, clusterApplyLB(cl, x = splitList(X, length(cl)), 
#         fun = lapply, fun, ...), quote = TRUE)
# }
# <bytecode: 0x000000000f20a7e8>
# <environment: namespace:parallel>

## Have a look at what `splitList()` does:
parallel:::splitList(1:4, 2)
# [[1]]
# [1] 1 2
# 
# [[2]]
# [1] 3 4

The problem is that it first splits its list of jobs up into equal-sized sublists that it then distributes among the nodes, each of which runs lapply() on its given sublist. So here, your first node runs jobs on the first and second inputs, while the second node runs jobs using the third and fourth inputs.

Instead, use the more versatile clusterApplyLB(), which works just as you'd hope:

system.time(
  parallel::clusterApplyLB(cl, 1:4, function(y) {
    if (y == 1) {
      Sys.sleep(3)
    } else {
      Sys.sleep(0.5)
    }}))
# user  system elapsed 
# 0.00    0.00    3.09 
Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • Thanks! That what I was looking for. I can't think of a case where parLapplyLB would produce something different from parLapply, and so I'm not sure what its purpose is. – josiekre Jul 06 '16 at 19:11
  • Does `clusterApplyLB(cl, X, fun)` have the same intended behavior as `parLapplyLB`? I've been trying this out on my system, and it seems to give the same output when `X` is a list, but I'm a little nervous just swapping out `parLapplyLB` with `clusterApplyLB`.. – guy Jul 12 '16 at 02:30
  • 2
    useful info here, as well as user defined parlapplyLB http://detritus.fundacioace.com/pub/books/Oreilly.Parallel.R.Oct.2011.pdf – Olivia Mar 02 '17 at 11:43
  • @Olivia Very interesting, especially pages 13--22 (and extra-especially pp. 20--22). Thanks! – Josh O'Brien Mar 02 '17 at 19:19
  • 1
    One thing not mentioned is avoid mclapply if youre working on tasks with large files. sendMaster doesnt like returning anything 2gig or larger. And turning off prescheduling just seemed to do lapply with only 1 core. parlapply (sock or fork) works and has similar performance times. – Olivia Mar 03 '17 at 11:15
4

parLapplyLB is not balancing the load because it has a semantic bug. We found the bug and provided a fix, see here. Now, its up to the R devs to include the fix.

Urist McDev
  • 498
  • 3
  • 14