Looking at the resource monitor during the execution of my script I noticed that all the cores of my PC were working, even if I did not implement any form of multiprocessing. Trying to pinpoint the cause, I discovered that the code is parallelized when using numpy
's matmult
(or, as in the example below, the binary operator @
).
import numpy as np
A = np.random.rand(10,500)
B = np.random.rand(500,50000)
while True:
_ = A @ B
Looking at this question it looks like the reason is that numpy
invokes BLAS/LAPACK
routines that are indeed parallelized.
Despite being nice that my code runs faster and uses all available resources, this is causing me troubles when I submit my code on a shared cluster managed by PBS
queue manager. Together with the cluster IT manager, we noticed that even if I ask for N CPUs on a cluster node, numpy
was still spawning a number of threads equal to the number of CPUs on the node.
This resulted in the node to be overloaded, as I was using more CPUs than those assigned to me.
Is there a way to "control" this behaviour and tell numpy
how many CPUs it can use?