I made two installations:
brew install numpy
(and scipy)--with-openblas
- Cloned GIT repositories (for numpy and scipy) and built it myself
After I cloned two handy scripts for verification of these libraries in multi-threaded environment:
git clone https://gist.github.com/3842524.git
Then for each installation I'm executing show_config
:
python -c "import scipy as np; np.show_config()"
It's all nice for the installation 1:
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/opt/openblas/lib']
language = f77
blas_mkl_info:
NOT AVAILABLE
But installation 2 the things are not so bright:
lapack_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3']
define_macros = [('NO_ATLAS_INFO', 3)]
blas_opt_info:
extra_link_args = ['-Wl,-framework', '-Wl,Accelerate']
extra_compile_args = ['-msse3', '- I/System/Library/Frameworks/vecLib.framework/Headers']
define_macros = [('NO_ATLAS_INFO', 3)]
So it seems when I failed to link OpenBLAS correctly. But it's fine for now, here the performance results. All tests are performed on iMac, Yosemite, i7-4790K, 4 cores, hyper-threaded.
First installation with OpenBLAS:
numpy:
OMP_NUM_THREADS=1 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.126578998566 sec
OMP_NUM_THREADS=2 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0640147686005 sec
OMP_NUM_THREADS=4 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0360922336578 sec
OMP_NUM_THREADS=8 python test_numpy.py
FAST BLAS
version: 1.9.2
maxint: 9223372036854775807
dot: 0.0364527702332 sec
scipy:
OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0276656150818 sec
svd: 0.732437372208 sec
OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.0182101726532 sec
svd: 0.441690778732 sec
OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.0130400180817 sec
svd: 0.316107988358 sec
OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.012854385376 sec
svd: 0.315939807892 sec
Second installation without OpenBLAS:
numpy:
OMP_NUM_THREADS=1 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0371072292328 sec
OMP_NUM_THREADS=2 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0215149879456 sec
OMP_NUM_THREADS=4 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0146862030029 sec
OMP_NUM_THREADS=8 python test_numpy.py
slow blas
version: 1.10.0.dev0+3c5409e
maxint: 9223372036854775807
dot: 0.0141334056854 sec
scipy:
OMP_NUM_THREADS=1 python test_scipy.py
cholesky: 0.0109382152557 sec
svd: 0.32529540062 sec
OMP_NUM_THREADS=2 python test_scipy.py
cholesky: 0.00988121032715 sec
svd: 0.331357002258 sec
OMP_NUM_THREADS=4 python test_scipy.py
cholesky: 0.00916676521301 sec
svd: 0.318637990952 sec
OMP_NUM_THREADS=8 python test_scipy.py
cholesky: 0.00931282043457 sec
svd: 0.324427986145 sec
To my surprise, the second case is faster than the first. In case of scipy there is no increase in performance after adding more cores, but even one core is faster than 4 cores in OpenBLAS.
Does anyone have an idea why is that?