Here is some code I wrote for parallelizing the creation of a dictionary
parallelize.py
if __name__ == "__main__":
import time
from multiprocessing import Pool
def assign_dict(alist):
return {x:x for x in alist}
my_dict = {}
size = 10000000
threshold=10000000
my_list=list(range(size))
start=time.time()
my_dict=assign_dict(my_list)
end=time.time()
print("check seq",end-start, " sec")
my_dict = {}
chunks = [my_list[i*threshold:(i+1)*threshold] for i in range(int(size/threshold))]
process_count = 7
pool = Pool(processes=process_count)
start = time.time()
inter_list = pool.map_async(assign_dict, chunks)
inter_list.wait()
inter_list=inter_list.get()
for some_dict in inter_list:
print("Combining...", time.time()-start, " sec elapsed")
my_dict = {**my_dict, **some_dict}
print("check 152002 as key ",my_dict[152002])
end=time.time()
print("check parallel",end-start, " sec")
Here is the output for size 1 mil with threshold 1 mil
check seq 0.6559352874755859 sec
Combining... 4.751460790634155 sec elapsed
check 152002 as key 152002
check parallel 5.064720869064331 sec
Here is output for size = 10mil with threshold 1 mil
check seq 0.668889045715332 sec
Combining... 1.6871337890625
Combining... 1.7269806861877441
Combining... 1.860083818435669
Combining... 2.0794677734375
Combining... 2.266465663909912
Combining... 2.47836971282959
Combining... 2.8915648460388184
Combining... 3.2443037033081055
Combining... 3.6063129901885986
Combining... 3.9933629035949707
check 115202 as key 1152002
check parallel 4.406447887420654 sec
Here is the output for size 100 mil with threshold 10 mil, the worst part here is even before the combining, the map_async
still takes 55 secs
compared to the 19 secs
in sequential.
check seq 19.182615041732788 sec
Combining... 55.18172788619995
Combining... 56.38586497306824
Combining... 58.534785747528076
Combining... 61.805513858795166
Combining... 64.75091290473938
Combining... 71.34392070770264
Combining... 76.02847385406494
Combining... 81.2545096874237
Combining... 87.75674867630005
Combining... 109.01232576370239
check 115202 as key 1152002
check parallel 126.1939218044281 sec
Even though I tried various combinations of size
and threshold
the code with pool is always slower, so it is not that the threshold is too big as the sequential version runs very quickly. Even when the size
is the same as the threshold
, the code with pool is many seconds slower.
And even for long running jobs like size = 1000 million, the parallelized version is way slower than sequential execution, meaning that there is no benefit of parallelization. I have 8 cores and 16GB RAM, I am running MacOSX, I even verified that my cores were running in parallel in the activity monitor to execute the task, yet it is slower. The combine phase does not take much time as shown. By the time the inter_list.get()
command ends, the parallel part is already done. So it cannot interfere with the dictionaries combining.
Can anyone please parallelize this code to be faster than the sequential version in Python 3 or at least help me understand why this is happening?