4

I currently create an application, which needs to run millions of statistical regressions in a short time. Parallelization of these calculations is one possibility to accelerate the process.

The OpenCPU server doesn’t seem to scale well with parallel executed commands. All commands are executed in a sequential manner.

Is it possible to spawn multiple R sessions using OpenCPU or do I need to run multiple instances of the server? Do I miss something here on how OpenCPU can process multiple computationally expensive commands simultaneously?

Paul Klemm
  • 265
  • 1
  • 9
  • There are tools in R that will take advantage of multiple cores. See the [high performance task view on CRAN](http://cran.r-project.org/web/views/HighPerformanceComputing.html). – Roman Luštrik Jan 15 '15 at 13:17
  • If you need to run many linear models, maybe you can improve performance by using [fastLm](http://finzi.psych.upenn.edu/R/library/RcppArmadillo/html/fastLm.html), arceepeepee ;) – EDi Jan 15 '15 at 14:29
  • This question is starting look like an advertisement for Dirks products :) – David Arenburg Jan 15 '15 at 14:46
  • I will look into the use of multiple cores. Since I want to get the result asap to dynamically update my visualization with it, I would still prefer multiple `R` instances. Even when using multiple cores in one method - I still need to wait for all jobs to finish in order to pipe the results back to my homepage. – Paul Klemm Jan 15 '15 at 17:03

1 Answers1

2

The OpenCPU cloud server executes all http requests in parallel, so first observation is false. Of course you must make simultaneous requests to do so.

If your code consists of a single R function or script, OpenCPU won't magically parallelize things for you, if that is what you are after. In that case you would need to use something like snow or mcparallel in your R function. But that is unrelated to OpenCPU, which only provides an http interface to your R function or script.

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207