I write a python code for Q-learning algorithm and I have to run it multiple times since this algorithm has random output. Thus I use multiprocessing
module. The structure of the code is as follows
import numpy as np
import scipy as sp
import multiprocessing as mp
# ...import other modules...
# ...define some parameters here...
# using multiprocessing
result = []
num_threads = 3
pool = mp.Pool(num_threads)
for cnt in range(num_threads):
args = (RL_params+phys_params) # arguments
result.append(pool.apply_async(Q_learning, args))
pool.close()
pool.join()
There is no I/O operation in my code and my work station has 6 cores (12 threads) and enough memory for this job. When I run the code with num_threads=1
, it takes me only 13 seconds and this mission only occupies 1 thread with CPU usage 100% (using top
command).
click to see picture of CPU status
However, if I run it with num_threads=3
(or more), it shall takes more than 40 seconds and this mission will occupy 3 threads with each thread use 100% CPU core.
click to see picture of CPU status
I can't understand this slowing down because there is no parallelization in all self-defined functions and no I/O operation. It is also interesting to notice that when num_threads=1
, CPU usage is always less than 100%, but when num_threads
is larger than 1, CPU usage may sometimes be 101% or 102%.
On the other hand, I wrote another simple test file which does not import numpy and scipy, then this problem never show. I have noticed this question why isn't numpy.mean multithreaded? and it seem my problem is due to the automatic parallelization of some methods in numpy
(such dot
). But as I shown in the pictures, I can't see any parallelization when I run a single job.