I am processing a matrix in a iterative manner. The serial version of my code is as follows. Here for each Iteration
, two inner for loops basically access each matrix element and update it. (I keep only the core part of my code by skipping few details so that I keep the discussion concise)
N,M=np.shape(Matrix2D)
for Iteration in range(20000)
for n in range(N):
for m in range(M):
Do cool stuff on Matrix2D[n][m]
I tried to parallelize the two inner loops. I created all possible combination of the iterators using itertools.product
. Then I distribute the job among a pool of workers. Some more detail: Multiprocess doesn't support 2D array. Therefore I declared a 1D list and mapped the matrix elements in it. (All process must be done on a shared memory in a synchronous manner for some reasons I didn't mention.)
The code looks as follows.
def conv2Dto1DIndex(n,m,N,M): #Maps 2D matrix with 1D list
return n*N+m
def UpdateMatrix(Matrix_1D,N,M,lock,IndexTouple): #This function does the cool stuff on the Matrix element
n=IndexTouple[0]
m = IndexTouple[1]
lock.acquire()
Ind1D=conv2Dto1DIndex(n,m,N,M)
Do cool stuff on Matrix_1D[Ind1D]
lock.release()
def MatrixEvolve_MultiProcess(N,M,MaxItr): #This function parallelize the process
with multiprocessing.Manager() as manager:
tmp = np.random.choice([-1, 1], size=(1, N * M))[0]
Matrix_1D=manager.list(tmp.tolist()) #1D shared list (to store same amount of entries as 2D matrix)
lock = manager.Lock()
x = range(N) #Does the job for loop over N
y = range(M) #Does the job for loop over M
paramlist = list(itertools.product(x,y)) #Create all possible combination of two inner loop iterators
###########################################
#Here the parallalization is implemented...
###########################################
for Itr in range(MaxItr): #Outer loop remains as it was in serial code
func = partial(UpdateMatrix,Matrix_1D,N,M,lock)
pool = multiprocessing.Pool() #Create Pool of workers
result = pool.map(func, paramlist) #Do the job of nested loop in parallel
pool.close()
pool.join()
Arr = np.array(Matrix_1D[:])
Arr = Arr.reshape((N, M)) #Convert 1D list into 2D matrix
###########################################
return Arr
if __name__=='__main__':
MaxItr=20000
t1=time.time()
(field,mean)=MatrixEvolve_MultiProcess(50,50,MaxItr)
print(time.time()-t1)
I understand in multiprocessing there are quite some extra instructions present. However it is supposed to speed up the most expensive part of the serial version of the code. That part is the nested loop.
In reality the serial code runs 5 iterations in 0.07 seconds while the parallel version takes 17.95 seconds. That said parallel code is 250 times slower!!! What went wrong here and how to fix it?