7

I've been using numpy for quite some time now and am fond of just how much faster it is for simple operations on vectors and matrices, compared to e.g. looping over elements of the same array.

My understanding is that it is using SIMD CPU extensions, but according to some, at least some of its functionality is making use of multiprocessing (via openMP?). On the other hand, there are lots of questions here on SO (example) about speeding operations on numpy arrays up by using multiprocessing.

I have not seen numpy definitely use multiple cores at once, although it looks as if sometimes two cores (on an 8-core machine) are in use. But I may have been using the "wrong" functions for that, or using them in the wrong way, or maybe my matrices are too small to make it worth it?

The question therefore:

  • Are there some numpy functions which can use multiple processes on a shared-memory machine, either via openMP or some other means?

  • If yes, is there some place in the numpy documentation with a definite list of those functions?

  • And in that case, is there some documentation on what a user of numpy would have to do to make sure they use all available CPU cores, or some specific predetermined number of cores?

I'm aware that there are libraries which permit splitting numpy arrays and such up across multiple machines or compute nodes, but I suspect the use case for that is either with being able to handle more data than fits into local RAM, or speeding processing up more than what a single multi-core machine can achieve. This is however not what this question is about.

Update

Given the comment by @talonmies (who states that by default there's no such functionality in numpy, and it would depend on LAPACK and BLAS): What's the easiest way to obtain a suitably-compiled numpy version which makes use of multiple CPU cores (and hopefully also SIMD extensions)?

Or is the reason why numpy doesn't usually multiprocess that most people for whom that is important have already switched to using Multiprocessing or things like dask to handle multiple cores explicitly rather than having only the numpy bits accelerated implicitly?

Zak
  • 3,063
  • 3
  • 23
  • 30
  • The answer to all your questions is no. The only parallelism it employees comes from whatever the underlying libraries it uses (like BLAS and LAPACK), and often that is none. Most of the generic performance improvement in numpy comes from using precompiled C code rather than relying on the Python interpreter. – talonmies Sep 30 '19 at 15:32
  • Looping over the elements of an array operates in Python code. It's actually slower than looping on elements of a list. The 'vectorized' operations use compiled code. It still iterates, but with code written in `c`. – hpaulj Sep 30 '19 at 15:34
  • @talonmies So that means in order to get parallel processing with numpy, I'd have to either compile it myself against the correct (potentially self-compiled?) versions of LAPACK and BLAS? I kind of wonder how that has not already been done to the regular versions, or at least a numpy fork which is readily available as a drop-in for the regular version. – Zak Sep 30 '19 at 18:32
  • 1
    It has already been done. The Anaconda numpy version is compiled against the Intel MKL library which includes multi-threaded versions of Level 2 and Level 3 BLAS, as an example. But numpy itself contains no intrinsic parallelism nor exposes any user controls over parallelism, and only a tiny subset of numpy functions actual use or can use libraries with multi-threaded implementations of common operations – talonmies Sep 30 '19 at 18:51
  • check https://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy for LAPACK/BLAS numpy. The package is compiled against MKL. – RaJa Oct 01 '19 at 13:43
  • 2
    If you install the Intel Distribution for Python (https://software.intel.com/en-us/distribution-for-python ) I believe that then uses MKL and parallelises appropriate MKL operations. – Jim Cownie Oct 02 '19 at 08:21

0 Answers0