1

I have made a program for adding a list by dividing them in subparts and using multiprocessing in Python. My code is the following:

from concurrent.futures import ProcessPoolExecutor, as_completed
import random
import time
      
def dummyFun(l):
    s=0
    for i in range(0,len(l)):
        s=s+l[i]
    return s

 
def sumaSec(v):
    start=time.time()
    sT=0
    for k in range(0,len(v),10):
        vc=v[k:k+10]
        print ("vector ",vc)
        for item in vc:
            sT=sT+item
        print ("sequential sum result ",sT)
        sT=0
    start1=time.time()
    print ("sequential version time ",start1-start)
    
        
def main():
    workers=5
    vector=random.sample(range(1,101),100)
    print (vector)
    sumaSec(vector)
    dim=10
    sT=0
    for k in range(0,len(vector),dim):
        vc=vector[k:k+dim]
        print (vc)
        for item in vc:
            sT=sT+item
        print ("sub list result ",sT)
        sT=0
        
    chunks=(vector[k:k+dim] for k in range(0,len(vector),10))
    start=time.time()
    with ProcessPoolExecutor(max_workers=workers) as executor:
        futures=[executor.submit(dummyFun,chunk) for chunk in chunks]
    for future in as_completed(futures):
        print (future.result())
    start1=time.time()
    print (start1-start)

if __name__=="__main__":
    main()

The problem is that for the sequential version I got a time of:

0.0009753704071044922

while for the concurrent version my time is:

0.10629010200500488

And when I reduce the number of workers to 2 my time is:

0.08622884750366211

Why is this happening?

halfer
  • 19,824
  • 17
  • 99
  • 186
Little
  • 3,363
  • 10
  • 45
  • 74

2 Answers2

3

The length of your vector is only 100. That is a very small amount of work, so the the fixed cost of starting the process pool is the most significant part of the runtime. For this reason parallelism is most beneficial when there is a lot of work to do. Try a larger vector, like a length of 1 million.

The second problem is that you have each worker do a tiny amount of work: a chunk of size 10. Again, that means the cost of starting a task cannot be amortized over so little work. Use larger chunks. For example, instead of 10 use int(len(vector)/(workers*10)).

Also note that you're creating 5 processes. For a CPU-bound task like this one you ideally want to use the same number of processes as you have physical CPU cores. Either use whatever number of cores your system has, or if you use max_workers=None (the default value) then ProcessPoolExecutor will default to that number for your system. If you use too few processes you're leaving performance on the table, if you use too many then the CPU will have to switch between them and your performance may suffer.

Adam
  • 16,808
  • 7
  • 52
  • 98
  • thank you for the reply, I have tried with a vector of a million, but still for the sequential part I got a time of 0.79 and for the concurrent part is 83.2, any advice? – Little Nov 16 '19 at 23:25
  • @Little I just played a bit with your code and your timing reports the sequential is 0.06s and the parallel as 0.02s. The reason yours takes a thousand times more time is because of the print statements. Only print what you actually need to see (timings) and leave off the rest. Also see my edit about chunk size. – Adam Nov 16 '19 at 23:45
  • I doubt you wanted to use `random.sample`, so I instead created a vector like this: `vector = [random.randint(1,101) for _ in range(1000000)]`. – Adam Nov 16 '19 at 23:46
  • could you post your actual code? I have deleted the prints instructions and still the gap is huge – Little Nov 17 '19 at 05:29
1

Your chunking is pretty awful for creating multiple tasks. Creating too many tasks still incurs the time punishment even when your workers are already created.

Maybe this post can help you in your search: How to parallel sum a loop using multiprocessing in Python

Sintifo
  • 48
  • 7