Why doesn't multiprocessing speedup my program, instead does the opposite

Question

I am processing a matrix in a iterative manner. The serial version of my code is as follows. Here for each Iteration, two inner for loops basically access each matrix element and update it. (I keep only the core part of my code by skipping few details so that I keep the discussion concise)

N,M=np.shape(Matrix2D)
for Iteration in range(20000)
   for n in range(N):
      for m in range(M):
         Do cool stuff on Matrix2D[n][m]

I tried to parallelize the two inner loops. I created all possible combination of the iterators using itertools.product. Then I distribute the job among a pool of workers. Some more detail: Multiprocess doesn't support 2D array. Therefore I declared a 1D list and mapped the matrix elements in it. (All process must be done on a shared memory in a synchronous manner for some reasons I didn't mention.)

The code looks as follows.

def conv2Dto1DIndex(n,m,N,M):   #Maps 2D matrix with 1D list
    return n*N+m

def UpdateMatrix(Matrix_1D,N,M,lock,IndexTouple):  #This function does the cool stuff on the Matrix element
    n=IndexTouple[0]
    m = IndexTouple[1]

    lock.acquire()
    Ind1D=conv2Dto1DIndex(n,m,N,M)
    Do cool stuff on Matrix_1D[Ind1D]
    lock.release()


def MatrixEvolve_MultiProcess(N,M,MaxItr):      #This function parallelize the process
    with multiprocessing.Manager() as manager:
        tmp = np.random.choice([-1, 1], size=(1, N * M))[0]  
        Matrix_1D=manager.list(tmp.tolist())                #1D shared list (to store same amount of entries as 2D matrix)
        lock = manager.Lock()

        x = range(N)    #Does the job for loop over N
        y = range(M)    #Does the job for loop over M
        paramlist = list(itertools.product(x,y))  #Create all possible combination of two inner loop iterators
        ###########################################
        #Here the parallalization is implemented...
        ###########################################
        for Itr in range(MaxItr):                       #Outer loop remains as it was in serial code

            func = partial(UpdateMatrix,Matrix_1D,N,M,lock)
            pool = multiprocessing.Pool()      #Create Pool of workers
            result = pool.map(func, paramlist) #Do the job of nested loop in parallel
            pool.close()
            pool.join()

            Arr = np.array(Matrix_1D[:])
            Arr = Arr.reshape((N, M))    #Convert 1D list into 2D matrix
        ###########################################
    return Arr

if __name__=='__main__':
    MaxItr=20000
    t1=time.time()
    (field,mean)=MatrixEvolve_MultiProcess(50,50,MaxItr)
    print(time.time()-t1)

I understand in multiprocessing there are quite some extra instructions present. However it is supposed to speed up the most expensive part of the serial version of the code. That part is the nested loop.

In reality the serial code runs 5 iterations in 0.07 seconds while the parallel version takes 17.95 seconds. That said parallel code is 250 times slower!!! What went wrong here and how to fix it?

Well, for one thing, you should create the process pool outside of the loop. That may not be causing all the slowdown, but it's probably a lot of it. — bnaecker, May 20 '20 at 01:59
Also, you can definitely pass the ndarray around between processes directly. See [here](https://stackoverflow.com/questions/7894791/use-numpy-array-in-shared-memory-for-multiprocessing), or use a queue to pass the data between processes. — bnaecker, May 20 '20 at 02:00
Have you tried to write "Do cool stuff on Matrix2D[n][m]" in a way that uses `numpy's` whole-array methods, instead of having to work on each element one at a time? Using `Matrix2D[n][m]` instead of `Matrix2d[n, m]` suggests that you have skipped most of the `numpy` basics. — hpaulj, May 20 '20 at 02:19

Why doesn't multiprocessing speedup my program, instead does the opposite

0 Answers0