1

I am running a sound analysis function on around 25k short audio files. My code works but will take a very long time to run. What would be a good approach to parallelize it?

Thanks a lot,

files = list.files(getwd(), pattern=".mp3", all.files=FALSE, full.names=FALSE)

out=NULL

for (i in files) {

res <- try(soundgen::analyze(i,pitchMethods = 'dom',  plot = FALSE, summary = TRUE), silent = TRUE)
res["1", "duration"] <- i[[1]]
out=rbind(out,res)
print(i)
}
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
user1029296
  • 609
  • 8
  • 17
  • I wrongly(perhaps) edited your title mainly because most base functions are vectorized already. Take a look at this: https://stackoverflow.com/questions/5571774/what-is-the-easiest-way-to-parallelize-a-vectorized-function-in-r – NelsonGon Jul 02 '19 at 20:08
  • 3
    Using `rbind` within a loop is very time consuming. It is best to preallocate the space and then assign the resulting value. Another option to try is is the `parLapply` function from the parallel package – Dave2e Jul 02 '19 at 20:08
  • 1
    Possible duplicate of [run a for loop in parallel in R](https://stackoverflow.com/questions/38318139/run-a-for-loop-in-parallel-in-r) – divibisan Jul 02 '19 at 20:29

1 Answers1

1

You can use the parallel package to achieve this easily:

library(parallel)
library(soundgen)

files <- list.files(getwd(), pattern=".mp3", all.files=FALSE, full.names=FALSE)

out <- NULL

soundAnalysis <- function(file){
  res <- try(soundgen::analyze(file,pitchMethods = 'dom',  plot = FALSE, summary = TRUE), silent = TRUE)
  res["1", "duration"] <- file[[1]]
  out <- rbind(out,res)
  print(i)
}

output <- mclapply(X = files, FUN = soundAnalysis, mc.cores = detectCores())
Brigadeiro
  • 2,649
  • 13
  • 30