I am trying to write a Python function for fast calculation of the sum of a list, using parallel computing. Initially I tried to use the Python multithreading library, but then I noticed that all threads run on the same CPU, so there is no speed gain, so I switched to using multiprocessing. In the first version I made the list a global variable:
from multiprocessing import Pool
array = 100000000*[1]
def sumPart(fromTo:tuple):
return sum(array[fromTo[0]:fromTo[1]])
with Pool(2) as pool:
print(sum(pool.map(sumPart, [(0,len(array)//2), (len(array)//2,len(array))])))
This worked well and returned the correct sum after about half the time of a serial computation.
But then I wanted to make it a function that accepts the array as an argument:
def parallelSum(theArray):
def sumPartLocal(fromTo: tuple):
return sum(theArray[fromTo[0]:fromTo[1]])
with Pool(2) as pool:
return (sum(pool.map(sumPartLocal, [(0, len(theArray) // 2), (len(theArray) // 2, len(theArray))])))
Here I got an error:
AttributeError: Can't pickle local object 'parallelSum.<locals>.sumPartLocal'
What is the correct way to write this function?