0

Why the overhead is much higher when running Python multiprocessing pool for the first time? What is different compared to the following runs?

import pandas as pd
import time 
import multiprocessing

def foo(n):
    for i in range(n):
        for j in range(n):
            for k in range(n):
                accum = i + j + k
    return(accum)

def test1(pool, n):
    pool.map(foo, [n, n])

def test2(n):    
    foo(n)
    foo(n)

if __name__ == "__main__":
    rtn = []
    pool = multiprocessing.Pool(processes=2)

    for n in range(100, 1100, 100):
        startTime = time.time()
        test1(pool, n)
        t1 = time.time() - startTime
        print('t1: {0} second'.format(time.time() - startTime))

        startTime = time.time()
        test2(n)
        t2 = time.time() - startTime
        print('t2: {0} second'.format(time.time() - startTime))

        rtn.append([n, t1, t2])

    xx = pd.DataFrame(rtn, columns=['n', 't1', 't2'])
    print(xx)

      n          t1          t2
0   100    3.843944    0.106006    <-------- t1 is much longer than t2
1   200    0.640689    1.000097
2   300    2.526334    4.140915
3   400    6.880183   11.183931
4   500   14.937281   25.981793
5   600   27.315186   39.802715
6   700   41.263902   60.289115
7   800   64.577426   95.624465
8   900   90.760957  132.725434
9  1000  120.575304  177.576586
2607
  • 4,037
  • 13
  • 49
  • 64
  • Find this SO post answers the question. https://stackoverflow.com/questions/1289813/python-multiprocessing-vs-threading-for-cpu-bound-work-on-windows-and-linux?rq=1 – 2607 Aug 09 '18 at 00:49

1 Answers1

0

This is because the pool has to be created first. So python has to tell the operating system to create the child processes (2 in your example). Once those processes are alive python can make use of those and work on tasks you submit to the pool. By the way you should close the pool after completion.

I like the three-loop-cpu-stresser! Hope i could solve your problem!

juliusmh
  • 457
  • 3
  • 12