I have some R code using the foreach
and doParallel
to parallelize (?) an lapply()
call, aka parLapply()
. It is 100 tasks and I'm splitting across my laptop's 4 cores. The function automatically partitions the tasks in sets of 25 with 1:25, 26:50, 51:75, 76:100 as the splits across the CPU. (I know because I've saved files using integer index as list to iterate over.) The tasks much simpler for tasks 76:100 and as such they were completed very quickly. Meanwhile, all the other tasks are still queued.
Whenever I run some more involved code like this, I will periodically monitor the progress and check out the task manager. Before, I've noticed where the processes spin up and down in between executions. This time, I saw that there is now one "core" process that is persistently/perpetually/constantly/always(now) inactive or idle, CPU 0% in task manager.
I assume this one was dedicated to the task list that was more quickly completed. There are 3 others that are active as usual processing away.
So my question: is there any way that I can modify the parallel coding so that tasks are more evenly distributed? Or a way to re-incorporate the idle core/process?
Similar to what is discussed in this post.
#not executed so not sure if actually reprex, but the concept is there
#computer is still running my other code :)
library(foreach)
library(doParallel)
cl <- makeCluster(mc <- getOption("cl.cores", parallel::detectCores()))
clusterExport(cl=cl, varlist=c("reprex"))
clusterEvalQ(cl, {library(dplyr)})
registerDoParallel(cl)
reprex2 <- parLapply(cl,
1:100,
function(x){
if(x<=75){
Sys.sleep(10)
}else{
print("Side question: any way for this to be seen? Or maybe some kind of progress bar baked into the code? I've taken to saving along the way to monitor.")
}
})
stopCluster(cl)
A lo-fi but maybe tedious solution I thought of would be to intuit or benchmark each "type" of task then organize/partition somewhat manually. But, I'm more looking for something on the code side for a less occupied / fatigued processor to deal with. :)
Bonus bonus bonus question(s) that are likely googleable: is each execution in parallel done with a clean environment? application-wise, should I front load memory intensive object when each core is established? or is it better to load a subset each time? Why do lists of lists take up more memory than a dataframe with the same content?