0

first of all thank you for taking the time to read my question. More than anything, I need conceptual help, since I do not understand what is wrong with my interpretation. Some time ago I try to refactor some of the algorithms that I use, so that they work in a parallel way and take advantage of all the CPUs that I have (something like 40, and my processes always use one at a time). Looking for examples and literature, I found that maybe the package that can best serve me is "doParallel", and I was reading this: https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf run a for loop in parallel in R However, when I implement it in my code, it consumes me more time than before. To see where the problem was, I reduced my code and limited it to some simple task, where it shows that it takes longer when I use doParallel than with the common loop that I always use. Here I share the code I did to evaluate and the output it gives me, where you can see what takes more time:

library(doParallel)
proteins_names <- c("TCSYLVIO_005590","TcCLB.503947.20","TcCLB.504249.111","TcCLB.511081.60","TCSYLVIO_009736","TcCLB.507071.100","TcCLB.507801.60","TcCLB.509103.10","TCSYLVIO_003504","TcCLB.503645.40","TcCLB.508221.490","TCSYLVIO_005223","TcCLB.505949.10","TcCLB.505949.120","TcCLB.506459.219","TcCLB.506763.340","TcCLB.506767.360","TcCLB.506955.250","TcCLB.506965.190","TcCLB.506965.90")
merged_total_test<-data.frame(matrix(nrow =100,ncol = 22, rnorm(n = 2200,sd = 2,mean=10)))
merged_total_test$protein<-proteins_names[sample(1:20,100,replace = T)]
merged_total_test$signal<-rnorm(n = 100,sd = 2,mean=1000)
cores=detectCores()
cl <- makeCluster(cores[1]-4)
registerDoParallel(cl)
init_time_parallel<-Sys.time()

dt_plot_total_parallel <- foreach (prot = 1:20, .combine=rbind) %dopar% {
  temp_protein_c <- merged_total_test[merged_total_test$protein == proteins_names[prot]&!is.na(merged_total_test$signal),]
  temp_protein_c
}

final_time_parallel<-Sys.time()
total_time_parallel<-final_time_parallel - init_time_parallel
stopCluster(cl)

init_time<-Sys.time()
dt_plot_total <- merged_total_test[0,]
for (prot in 1:20){
  print(prot)
  temp_protein_c <- merged_total_test[merged_total_test$protein == proteins_names[prot]&!is.na(merged_total_test$signal),]
  dt_plot_total<-rbind(dt_plot_total,temp_protein_c)
}
final_time<-Sys.time()
total_time<-final_time - init_time

total_time
total_time_parallel
identical(dt_plot_total,dt_plot_total_parallel)#should be true

output:

> total_time
Time difference of 0.3065186 secs
> total_time_parallel
Time difference of 1.939842 secs
> identical(dt_plot_total,dt_plot_total_parallel)#should be true
[1] TRUE
twedl
  • 1,588
  • 1
  • 17
  • 28

0 Answers0