Python3 multiprocessing pool unbalanced cpu usage on compute engine

Question

I'm trying to run a Python3 program that uses multiple cores to do computations, on a Google cloud compute engine.
The code looks like this:

import multiprocessing
from multiprocessing import Pool
# functions and variables defined
MAX_PROCESS_COUNT = (multiprocessing.cpu_count() - 1) or 1
if __name__=="__main__":
    with Pool(processes=MAX_PROCESS_COUNT) as pool:
        result = list(tqdm.tqdm(pool.imap(single_task, range(len(my_list))), total=len(my_list)))

The compute engine has 20 cores, so I decide to use only 19 of them. my_list has about 200 values, and each single_task takes about 10 mins to complete on my laptop.

When I actually run the program, it took about 1.6 hours to complete only 35 tasks.
So I check the htop, and find that all CPU cores are in use, and the memory usage looks unusually low (expected 14G):

More importantly, the CPU usages for each individual task is highly unbalanced:

I believe that the unbalanced CPU usage is the problem here.
Is there any way to constrain that usage? Should I config in the VM environment or change the python code?

I've tested the same code on my laptop, and it runs as expected: out of 8 cores only 1 core is not fully utilized.

By the way, my code uses packages like NumPy, Pandas, and sklearn, and I've already set up libblas for NumPy

Packages using C code like Numpy are not bound to the same GIL limitations as pure Python code. They can easily use more than one CPU at a time. You should consider that when choosing the number of processes to run in parallel. While your CPUs are utilized there might less performance due to context switches. The RAM usage can be lower due to efficient copy on write forks. — Klaus D., Jun 03 '20 at 08:05
But why the same code works on my laptop? What is reducing the performance here in the google vm? — teamclouday, Jun 03 '20 at 08:13
I also see that sklearn (or maybe numpy) opens Tensorflow's cudart lib at launch on laptop, probably for MLP. On the google VM I didn't have libcudart. Would that make a difference? — teamclouday, Jun 03 '20 at 08:32
Thanks @KlausD. You're right! the problem is that NumPy using resources from other cores. I've configured the environment variables to limit its threads. Now it runs very smoothly as expected. — teamclouday, Jun 03 '20 at 18:58

score 0 · Answer 1 · answered Jun 03 '20 at 18:56

I found the solution here, which is what Klaus D. mentioned. The NumPy computations are not bound to any process, and need to to configured before running Python program.
So, in my situation, I added these lines to the top of my Python file:

import os
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'

So that each NumPy related computation is restricted to its own assigned process.

Additionally, you can check your NumPy configuration by:

import numpy as np
np.show_config()

And see which environment variable should be set to limit the number of threads.

Python3 multiprocessing pool unbalanced cpu usage on compute engine

1 Answers1

Linked