I have a python script which consist of numpy and scipy functions. I was trying to check the scaling for my code.
numpy.show_config()
The configuration for the numpy installed in my system shows the following information.
blas_mkl_info:
libraries = ['mkl_rt']
library_dirs = ['Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['Library\\include']
blas_opt_info:
libraries = ['mkl_rt']
library_dirs = ['Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['Library\\include']
lapack_mkl_info:
libraries = ['mkl_rt']
library_dirs = ['Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['Library\\include']
lapack_opt_info:
libraries = ['mkl_rt']
library_dirs = ['Library\\lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['Library\\include']
Supported SIMD extensions in this NumPy install:
baseline = SSE,SSE2,SSE3
found = SSSE3,SSE41,POPCNT,SSE42,AVX,F16C,FMA3,AVX2,AVX512F,AVX512CD,AVX512_SKX,AVX512_CLX,AVX512_CNL
not found =
So I tried setting the following environment variables before importing numpy:
import os
os.environ["OMP_NUM_THREADS"] = '16'
os.environ["OPENBLAS_NUM_THREADS"] = '16'
os.environ["MKL_NUM_THREADS"] = '16'
But still, my code is only using 1 thread and there is not difference in the time of execution of the program.
I have also tried setting mkl.set_num_threads(16)
but no difference.
I am aware that python has GIL which doesn't allow multiple threads to execute simultaneously as u expect in C. Is there any other way to set the number of threads to be used in python?