I am writing an OpenMP code calling different BLAS kernels, mostly DGEMMs with different sizes, in different threads. To maximize performance I want to have control over the number of threads I am calling for each BLAS. It seems that it is a very obvious basic need though it is very hard to do.
OpenBLAS has a function openblas_set_num_threads(int n)
, in the README file of OpenBLAS code it is described that
These are only used once at library initialization, and are not available for fine-tuning thread numbers in individual BLAS calls.
So I guess I cannot use this function in OpenBLAS.
MKL has a function mkl_set_num_threads_local(int nt)
which seems to be the answer of my question just when I am using MKL.
Is there a way to be able to tune number of threads for each BLAS call regardless of library I am using? (the ideal choice) If not, is it just MKL that let me tune number of threads?