Is numpy matmul parallelized and how to stop it?

Question

Looking at the resource monitor during the execution of my script I noticed that all the cores of my PC were working, even if I did not implement any form of multiprocessing. Trying to pinpoint the cause, I discovered that the code is parallelized when using numpy's matmult (or, as in the example below, the binary operator @).

import numpy as np

A = np.random.rand(10,500)
B = np.random.rand(500,50000)
while True:
    _ = A @ B

Looking at this question it looks like the reason is that numpy invokes BLAS/LAPACK routines that are indeed parallelized.

Despite being nice that my code runs faster and uses all available resources, this is causing me troubles when I submit my code on a shared cluster managed by PBS queue manager. Together with the cluster IT manager, we noticed that even if I ask for N CPUs on a cluster node, numpy was still spawning a number of threads equal to the number of CPUs on the node.

This resulted in the node to be overloaded, as I was using more CPUs than those assigned to me.

Is there a way to "control" this behaviour and tell numpy how many CPUs it can use?

Bram Vanroy · Accepted Answer · 2020-01-28T13:08:33.243

You can try using threadpoolctl. See the README for details. Before using, I recommend to have a look at the "known limitations" section, though.

Citation from that README

Python helpers to limit the number of threads used in the threadpool-backed of common native libraries used for scientific computing and data science (e.g. BLAS and OpenMP).

Fine control of the underlying thread-pool size can be useful in workloads that involve nested parallelism so as to mitigate oversubscription issues.

Code snippet from that README

from threadpoolctl import threadpool_limits
import numpy as np


with threadpool_limits(limits=1, user_api='blas'):
    # In this block, calls to blas implementation (like openblas or MKL)
    # will be limited to use only one thread. They can thus be used jointly
    # with thread-parallelism.
    a = np.random.randn(1000, 1000)
    a_squared = a @ a

Is numpy matmul parallelized and how to stop it?

1 Answers1