I'm trying to run a Python3 program that uses multiple cores to do computations, on a Google cloud compute engine.
The code looks like this:
import multiprocessing
from multiprocessing import Pool
# functions and variables defined
MAX_PROCESS_COUNT = (multiprocessing.cpu_count() - 1) or 1
if __name__=="__main__":
with Pool(processes=MAX_PROCESS_COUNT) as pool:
result = list(tqdm.tqdm(pool.imap(single_task, range(len(my_list))), total=len(my_list)))
The compute engine has 20 cores, so I decide to use only 19 of them. my_list
has about 200 values, and each single_task
takes about 10 mins to complete on my laptop.
When I actually run the program, it took about 1.6 hours to complete only 35 tasks.
So I check the htop
, and find that all CPU cores are in use, and the memory usage looks unusually low (expected 14G):
More importantly, the CPU usages for each individual task is highly unbalanced:
I believe that the unbalanced CPU usage is the problem here.
Is there any way to constrain that usage? Should I config in the VM environment or change the python code?
I've tested the same code on my laptop, and it runs as expected: out of 8 cores only 1 core is not fully utilized.
By the way, my code uses packages like NumPy, Pandas, and sklearn, and I've already set up libblas for NumPy