4

If I run the following python code

def dummy(t):
    A = np.random.rand(10000, 10000)
    inv = np.linalg.inv(A)
    return np.linalg.norm(inv)


if __name__ == "__main__":
    with multiprocessing.Pool(2) as pool:
        print(pool.map(dummy, range(20)))

more than the specified 2 processes are spawned, or at least it seems that way. More specifically, when I use htop to monitor the system, it shows all threads as busy, i.e. 100% CPU usage. I would expect that only 2 threads show full 100% usage, but perhaps that assumption is wrong.

Curiously enough, if the matrix size is increased (by a factor of 10), only the 2 specified threads are busy.

Used python version: 3.6.9 / 3.8.5. Machine: skylake server with 40 cores.

martineau
  • 119,623
  • 25
  • 170
  • 301
marcel.koch
  • 123
  • 4
  • `multiprocessing` is for spawning separate **processes**, so all the discussion about threads doesn't seem relevant. – martineau Aug 15 '20 at 08:40
  • Muliprocessing with subprocessing? Pinning? Threads? "Matrix size is increased, only 2 specified threads are spawned"? (You have only specified 2 *processes* in the pool). I can't follow any of this. – Booboo Aug 15 '20 at 11:02
  • @Booboo, I've updated the question to only contain the example. I thought my motivation for using multiprocessing would be helpful, but instead it brought only confusion. I hope the problem is easier to understand in the boiled down version. – marcel.koch Aug 15 '20 at 12:06
  • @martineau I'm sorry, if the nomenclature wasn't clear, and to be honest I'm not really sure about the distinction between processes and threads. The point is that htop shows that all hardware threads are active although I would expect that only two are active. – marcel.koch Aug 15 '20 at 12:08
  • 2
    I have 8 core processors on my desktop and if I just call `dummy` as a function without using multiprocessing at all my cpu utilization goes to 100%. This strongly suggests that the `numpy` library itself (which uses C language code) might be using multiple cores. See https://numpy.org/devdocs/reference/routines.linalg.html#module-numpy.linalg, which describes `numpy.linalg` being "multithreaded" in the C-language sense (which is different from the Python sense since two Python threads cannot execute Python code concurrently and thus will not run up the CPU). – Booboo Aug 15 '20 at 13:16
  • @Booboo thanks, for your answer, that seems to be the point. If I replace the `numpy` test with something else, it works as expected. – marcel.koch Aug 15 '20 at 13:33
  • [`htop`](https://en.wikipedia.org/wiki/Htop) is a process-viewer and process-manager — and has nothing to do with threads. `multiprocessing.Pool` may use threads internally to do what it does, but that shouldn't be a concern because threads in Python always run on the same CPU (subject to the limitations of the GIL). `numpy` may not subject to the latter, because it's written in C. I've added that tag to your question. – martineau Aug 15 '20 at 15:26

1 Answers1

2

As the comment from @Booboo suggests, the example contains additional parallelism not accounted for. Most likely the numpy.linalg.inv call uses some sort of multithreaded under the hood. Therefore the assumption, that only as many hardware threads as the number of processes specified in the Pool constructor, is invalid. If the source of the additional parallelism is known and can be disabled, the expected behavior can be achieved.

This answer contains instructions about how to limit the number of threads available to numpy. This might give performance-benefits if you have a higher-level source of parallelism. Note that it can only be done globally through environment-variables before importing numpy, not on a per-function basis.

julaine
  • 382
  • 3
  • 12
marcel.koch
  • 123
  • 4