I have a couple of R projects in mind, both of which I'd like to utilise parallel processing and optimised BLAS libraries, but I'm a complete novice in this sort of thing so I was hoping to get some advice on the best approach(es) for the scenarios I'm envisaging.
I'm aware there's a variety of ways to use optimised libraries - for example, I could use the Microsoft Open R (formerly Revolution R), which comes built to use the MKL library - which are optimised and (I think) automatically use parallel computations. Or, on Linux I could use Atlas or OpenBlas. Both of which are described here: http://brettklamer.com/diversions/statistical/faster-blas-in-r/
Alternatively I could utilise various packages within R (as I mention below), which utilise (I think) optimised libraries automatically (RcppEigen, for example).
As you can tell, I know very little about all this and, although I've read the documentation, a lot of it assumes knowledge (and knowledge of terminology) way beyond my experience. I'm probably completely wrong already.
I'm also aware that there's a variety of ways of undertaking parallel processing. The simplest seems to be parallelMap, which builds upon parallel. Of course there are several other techniques.
Maybe it's best to explain what I want to do.
- The first project involves a machine learning tool for some friends to use. Because many of them will have no experience of R (or anything beyond Excel), I will package everything into a deployable shiny app using the terrific tool: https://github.com/wleepang/DesktopDeployR.
The project will require using svm from e1071 (https://cran.r-project.org/web/packages/e1071/index.html). I was planning on implementing tuning of the hyperparameters using mlr (https://mlr-org.github.io/mlr-tutorial/release/html/index.html) - rather than the tune function within e1071.
I believe that I don't need to worry about optimised BLAS libraries for svm as e1071 utilises libsvm - which I assume does it for me. But I could easily be wrong. Maybe it does the parallel processing too.
For the tuning of the hyperparameters, which is the most time consuming part of the process, I have no idea about the use of optimised libraries. I think it depends which tuning algorithm you ask mlr to use (it uses algorithms from other packages).
mlr does mention that parallelMap can be used around code to automatically parallelise (is that a term?) the parts that can be run in parallel.
So, for this project I think it's enough to use parallelMap to run parallel process in this project as much of the code will be too high level to do anything manual anyway.
But is it correct that e1071 (and hence libsvm) are all using optimised BLAS libraries - or should I consider implementing something about that?
If not, all of the users will be Windows based, so while the Microsoft Open R seems like a good fit - I don't know of a simple way to make it portable in the same way as the use of R-Portable in DesktopDeployR, linked above. Is there an easy-ish way to do that with "normal" R - which fits with DesktopDeployR?
- The second project is that I'd like to try and write my first R package. I wrote a useful code (originally in Maple, and later in R) that models some physical systems. I thought it would be nice to write an R package around that and contribute back to the community a little. My code contains several for loops, and several matrix multiplications/additions, and also finding eigen-vectors (using a modified version of "eigen" which doesn't sort the eigenvectors). Maybe I should use RcppEigen - or other - for that nowadays, it was originally written in 2007 (I'm tardy).
The eigen documentation specifies "eigen uses the LAPACK routines DSYEVR, DGEEV, ZHEEV and ZGEEV.", but I don't know if that means they're optimised ones, or the (slower) ones R provides. My ignorance shining through again.
So, my questions are similar to above. What's the best way to write this code so as to parallelise the computations, where possible, and to ensure they're run using optimised BLAS libraries? Ideally giving the user some input parameters to help control this - e.g. specify number of cores to run to. Except, obviously it has to be valid for all platforms.
Or, as an R package, is it best to leave these decisions to the user - even if that means they have to do any parallel processes at a more high level?
tl;dr - what's the best way to ensure R uses optimised linear algebra libraries (which also use parallel processing), and how to ensure for loops are run on multiple cores? I think there are packages that automatically do both for linear algebra, but I'm not sure. Is parallelMap the best solution for for loops - or better to dive into parallel itself?