3

Here is the story.

From Seurat vignette, FindMarkers() can be accelerated by utilizing future package, future::plan("multiprocess", workers = 4)

However, I am running a simulation that I need to use FindAllMarkers() inside a doParallel::foreach() loop after doParallel::registerDoParallel(numCores=10).

  1. What's the parallelization that happened behind the scene?
  2. How to leverage the most power of HPC under this setup?
  3. How many CPUs should I allocate for this job to maximize the parallelization?

Any idea is welcome.

Below is a minimum example. pbmc.rds is here.

library(Seurat)

# Enable parallelization for `FindAllMarkers()`
library(future)
plan("multiprocess", workers = 4)


# Enable parallelization for `foreach()` loop
library(doParallel)
registerDoParallel(cores = 10)

pbmc <- readRDS("pbmc.rds")

rst <- foreach(i = 1:10/10, .combine = "cbind") %doPar% {
  
  pbmc <- FindClusters(pbmc, resolution = i)
  
  # should put future command here instead?
  # plan("multiprocess", workers = 4)
  
  DEgenes <- FindAllMarkers(pbmc)
  
  write.csv(DEgenes, paste0("DEgenes_resolution_", i, "csv"))
  
  pbmc$seurat_clusters
}
yuw444
  • 380
  • 2
  • 10

0 Answers0