5

I am using foreach and parallel libraries to perform parallel computation, but for some reason, while running, it only uses 1 CPU at a time (I look it up using 'top' (Bash on Linux Terminal).

The server has 48 cores, and I've tried:

  • Using 24, 12 or 5 cores
  • Example codes (as the one below)
  • In Windows, where the tasks as such appear, but they do not use any CPU
list.of.packages <- c("foreach", "doParallel")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if (length(new.packages)) install.packages(new.packages)

library(foreach)
library(doParallel)

no_cores <- detectCores() / 2 # 24 cores
cl<-makeCluster(no_cores)
registerDoParallel(cl)

df.a = data.frame(str = cbind(paste('name',seq(1:60000))), int = rnorm(60000))
df.b = data.frame(str = sample(df.a[, 1]))
df.b$int = NA

foreach(row.a = 1:length(df.a$str),
        .combine = rbind,
        .verbose = T)  %dopar% {
          row.b = grep(pattern = df.a$str[row.a], x = df.b$str)
          df.b$int[row.b] = df.a$int[row.a]
          df.b
        }
stopCluster(cl)

I expect this code to use several CPUs (as many as defined), but it actually uses 1.

Hart Radev
  • 361
  • 1
  • 10
  • 1
    You should try with something more taxing for a CPU than `x^n`. – Roland Aug 13 '19 at 13:59
  • @Roland My original code contains a ```grep(pattern[i], x)``` , where the *pattern* is 64 thousand values, and *x* a table of 15 million rows; but leads to the same result. I've also tried this example on Windows, and although the cluster is made, it does not use any CPU. – Hart Radev Aug 13 '19 at 14:04
  • 1
    Take a look at [this](https://stackoverflow.com/a/33632696/5793905) and [this](https://stackoverflow.com/a/50804071/5793905) answer. – Alexis Aug 15 '19 at 17:10

1 Answers1

3

Run foreach(..., verbose = TRUE) to understand what is going on. Have slightly changed the code that is running to better identify when the parallel code is running.

library(foreach)
library(doParallel)

no_cores <- detectCores() / 2 # 24 cores
cl<-makeCluster(no_cores)
registerDoParallel(cl)

base = 2

out <- foreach(exponent = 2:400,
        .combine = sum, .verbose = TRUE)  %dopar%
  runif(1000000)

First segment:

# discovered package(s):  
# no variables are automatically exported
# explicitly exporting package(s):
# numValues: 3999, numResults: 0, stopped: TRUE

This setup is not parallel - this is your master setting up your children. This takes a very long time with 2:40000000, which may be where you are stopping, and you would only see one core in use.

# numValues: 79, numResults: 0, stopped: TRUE
# got results for task n

This computation while you are waiting for this to be printed should be parallel. On Windows I see 4 cores working to calculate each runif.

# calling combine function
# evaluating call object to combine results:
#   fun(accum, result.n)

This runs for each worker with a different value for n. This is your combine function and is not parallel either.

I think your code is getting hung up on the setup piece, and you are only observing the serial part of the operation. If not, I would watch what is happening with verbose = TRUE and watch for more clues.

I don't know how your main problem is setup, but your example is not a good example of how to set up parallelization - you are using millions of workers to do very small tasks, so your serial overhead costs per worker are very high. You will see improved performance if you can send larger pieces to each worker.

Chris
  • 6,302
  • 1
  • 27
  • 54
  • Indeed, it looks like it takes a while to set up, and then it does the parallel computation. But even so, it does not use all the cores I input in the code (?). I modified the example so that it shows how my main code works. – Hart Radev Aug 14 '19 at 12:52
  • 4-5 instead of the selected 24 (which might not be able to use all 24 of them due to other tasks taking ~ 50% of the cores, but there's definitely enough capacity for more than the ones currently running) – Hart Radev Aug 15 '19 at 07:45