The Armadillo C++ linear algebra library documentation states one of the reasons for developing the library in C++ to be "ease of parallelisation via OpenMP present in modern C++ compilers", but the Armadillo code does not use OpenMP. How can I gain the benefits of parallelisation with Armadillo? Is this achieved by using one of the high-speed LAPACK and BLAS replacements? My platform is Linux, Intel processor but I suspect there is a generic answer to this question.
1 Answers
Okay so it appears that parallelisation is indeed achieved by using the high-speed LAPACK and BLAS replacements. On Ubuntu 12.04 I installed OpenBLAS using the package manager and built the Armadillo library from the source. The examples in the examples
folder built and run and I can control the number of cores using the OPENBLAS_NUM_THREADS
environment variable.
I created a small project openblas-benchmark which measures the performance increase of Armadillo when computing a matrix product C=AxB for various size matrices but I could only test it on a 2-core machine so far.
The performance plot shows nearly 50% reduction in execution time for matrices larger than 512x512. Note that both axes are logarithmic; each grid line on the y axis represents a doubling in execution time.

- 1,501
- 1
- 19
- 40
-
My bad; I didn't see the logarithmic scale on the X axis. 2x is 2x, alright. If I were you, I'd check it as the right answer again :=} – Ira Baxter Mar 25 '14 at 23:31
-
I did, thanks :-) Performance plots are often log-scaled since problems of polynomial complexity show as straight lines in the plots and the exponent may be read off the derivative of the curves. – Svaberg Mar 26 '14 at 10:17