I process 2 large 1D arrays (lets say A and B): I operate over pairs of A and B elements and write a results to a shared array C (think of C as a histogram). I want to use multiprocessing to parallelize the process. I thought the optimal approach could be slicing array A in a number of unique chunks equal to the number of parallel processes I choose to execute on, and using for loop to do math against all elements of B.
I was reading many questions/answers. I looked at Multiprocessing a loop of a function that writes to an array in python as an example which uses Process. I tried to adapt to my problem, but I'm getting performance of a serial execution. The code I am testing:
from multiprocessing import Process, Lock
from multiprocessing.sharedctypes import Array
import numpy as np
import time
def ProcessData(sub_data1, data2, freq):
for dat1 in sub_data1:
for dat2 in data2:
d = int( np.sqrt( (dat1 - dat2)**2 ) )
#d = int(dat1 - dat2)
if (d < len(freq)):
freq[d] += 1
def SplitList(data, n):
sub_len = divmod(len(data),n)[0]
print(sub_len)
slices = []
for i in range(n):
slices.append( data[i*sub_len:i*sub_len+sub_len] )
return slices
def main(nproc):
print("Number of cpu : ", mp.cpu_count())
lock = Lock()
N = 30
chip = [1,1,1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,5,5,5,5,5,6,6,6,6,6,7,7,7,7,7,8,8,8,8,8,9,9,9,9]
data1 = np.array( chip * N )
data2 = np.array( chip * N )
freq = Array('i', 100, lock=lock)
dat1_subs = SplitList(data1,nproc)
print('Number of data1 slices {:d}'.format(len(dat1_subs)))
t_start = time.time()
if __name__ == '__main__':
for i in range(0, nproc):
print('LEN {:d}: {:d}'.format(i, len(dat1_subs[i] )) )
p = Process(target=ProcessData, args=(dat1_subs[i], data2, freq))
p.start()
p.join()
t_end = time.time()
print('Total time (s)= ' + str(t_end - t_start))
print(str(list(freq)))
#new_array = np.frombuffer(freq.get_obj())
Sum = sum( list(freq) )
print('Total {:d}'.format(Sum))
NProc = 4
main(NProc)
I would appreciate any input or hints what I'm doing wrong. Or maybe there more simpler approaches I just don't know. Thanks.