Here is the story.
From Seurat vignette, FindMarkers()
can be accelerated by utilizing future
package, future::plan("multiprocess", workers = 4)
However, I am running a simulation that I need to use FindAllMarkers()
inside a doParallel::foreach()
loop after doParallel::registerDoParallel(numCores=10)
.
- What's the parallelization that happened behind the scene?
- How to leverage the most power of HPC under this setup?
- How many CPUs should I allocate for this job to maximize the parallelization?
Any idea is welcome.
Below is a minimum example. pbmc.rds
is here.
library(Seurat)
# Enable parallelization for `FindAllMarkers()`
library(future)
plan("multiprocess", workers = 4)
# Enable parallelization for `foreach()` loop
library(doParallel)
registerDoParallel(cores = 10)
pbmc <- readRDS("pbmc.rds")
rst <- foreach(i = 1:10/10, .combine = "cbind") %doPar% {
pbmc <- FindClusters(pbmc, resolution = i)
# should put future command here instead?
# plan("multiprocess", workers = 4)
DEgenes <- FindAllMarkers(pbmc)
write.csv(DEgenes, paste0("DEgenes_resolution_", i, "csv"))
pbmc$seurat_clusters
}