I'm trying to accelerate matrix multiplication on my own, on python. I've looked for several ways and one of them was parallel computing on CPU with BLAS on top of numpy. I've read on documentation that numpy.dot (for matrix multiplication) uses BLAS. Link to numpy.dot library.
It uses an optimized BLAS library when possible (see numpy.linalg).
However, when I download the OpenBLAS library and try to make it work with my code, it doesn't work at all, the speed is the same as before. The command htop
in the terminal shows that only one of my 8 cores in my processor is used.
My work environment is one of the latest version of Mint.
To download OpenBLAS I followed one instruction on another stackoverflow post here, however it doesn't go further than the installation. I first try with shell command:
sudo apt-get install libopenblas-dev
First I uninstalled numpy with pip:
pip uninstall numpy
And installed it after first installing libopenblas.
I didn't cloned the OpenBLAS as mentioned in the URL because I tried to keep it simple.
Then, I tried the following code in my python script:
import numpy as np
import time
import multiprocessing as mp
import os
#Environment for multi-threading
nb_processeurs = str(mp.cpu_count())
os.environ["OPENBLAS_NUM_THREADS"] = nb_processeurs
os.environ["BLAS"] = "openblas64_"
print(np.__config__.show())
#Variables
n = 5000
p = 300
# Generate a matrix of 0 and 1 with 30% of 1 and 70% of 0
A = (np.random.rand(n,p)> 0.7).astype(int)
A_t = A.T
#Numpy dot product
start_time = time.time()
C1 = np.dot(A,A_t)
end_time = time.time()
print("NumPy dot product took {} seconds".format(round(end_time - start_time,2)))
With multiprocessing
library I check how many core I have on my CPU then give it to the os.environ["OPENBLAS_NUM_THREADS"]
environment variable.
I wrote os.environ["BLAS"] = "openblas64_"
because the function np.__config__.show()
displayed in the terminal that the openblas version I have is openblas64_
:
openblas64__info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
blas_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None)]
runtime_library_dirs = ['/usr/local/lib']
openblas64__lapack_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
lapack_ilp64_opt_info:
libraries = ['openblas64_', 'openblas64_']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None), ('BLAS_SYMBOL_SUFFIX', '64_'), ('HAVE_BLAS_ILP64', None), ('HAVE_LAPACKE', None)]
runtime_library_dirs = ['/usr/local/lib']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2
not found = AVX512F,AVX512CD,AVX512_KNL,AVX512_KNM,AVX512_SKX,AVX512_CLX,AVX512_CNL,AVX512_ICL
None
So, by setting the 'os.environ' variable as above, I expected the code to run in parallel, to no avail.
Is it a problem with the installation or with how I wrote the script ? I'm kind of new to linux ways of installing libraries/packages and make it work so it may come from that. I didn't tried to clone the git for OpenBLAS for this reason.
If any of you have and idea of solution I would be very grateful.