I'm working on an algorithm, and I've made no attempt to parallelize it other than just by using numpy/scipy. Looking at htop
, sometimes the code uses all of my cores and sometimes just one. I'm considering adding parallelism to the single-threaded portions using multiprocessing
or something similar.
Assuming that I have all of the parallel BLAS/MKL libraries, is there some rule of thumb that I can follow to guess whether a numpy/scipy ufunc is going to be multithreaded or not? Even better, is there some place where this is documented?
To try to figure this out, I've looked at: https://scipy.github.io/old-wiki/pages/ParallelProgramming, Python: How do you stop numpy from multithreading?, multithreaded blas in python/numpy.