0

So, I started learning multiprocessing in python. I created a pool for function 'res'. I was interested in time after run program using the pool and using normal way, I thought that if I use pool processing time would be reduced but as I see, pool took 10.0413179397583 sec(s) and normal way took 0.005002737045288086 sec(s). What did I miss?

import multiprocessing as mp
import time

def res(a):
    squ = 0
    for i in range(a):
        squ += i**2
    return squ

if __name__ == "__main__":

    t1 = time.time()
    p = mp.Pool()
    result = p.map(res, range(10000))
    p.close()
    p.join()
    print(time.time()-t1)

    t2 = time.time()
    result = []
    sum = 0
    for i in range(10000):
        sum += i**2
        result.append(sum)
    print(time.time()-t2)
mostafa8026
  • 273
  • 2
  • 12
  • 25
  • 1
    Um, your `pool` example is calculating `10000` different loops, from `0-9999` iterations each one. Your second example is a *single* loop, with `9999` iterations... regardless, the assumption that "if i use pool porcessing time would be reduce" is not a safe bet. There are many ways a multiprocessing approach would take more time. – juanpa.arrivillaga Feb 15 '18 at 19:33

3 Answers3

0

The algorithm you use with multiprocessing is O(n^2) (loop of 1, loop of 2, ... loop of 9999), while the "normal approach" is O(n). Without multiprocessing, the first way took about 3 times longer in my tests.

Related: What is a plain English explanation of “Big O” notation?

internet_user
  • 3,149
  • 1
  • 20
  • 29
0

Your pool example is calculating 10000 different loops, from 0-9999 iterations each one. Your second example is a single loop, with 9999 iterations...

Here's an apples-to-apples approach:

import multiprocessing as mp
import time
import sys

NUM_ITER = int(sys.argv[1])
def res(a):
    squ = 0
    for i in range(a):
        squ += i**2
    return squ

if __name__ == "__main__":

    t1 = time.time()
    p = mp.Pool(None)
    result = p.map(res, range(NUM_ITER))
    p.close()
    p.join()
    print(f"With multiprocessing: {time.time()-t1}")

    t2 = time.time()
    result = [res(i) for i in range(NUM_ITER)]
    print(f"Without multiprocessing: {time.time()-t2}")

Note, multiprocessing will take longer because of the overhead unless you are doing a lot of iterations, so consider:

Juans-MacBook-Pro:temp juan$ python -B timing_mp.py 100
With multiprocessing: 0.18288207054138184
Without multiprocessing: 0.002610921859741211
Juans-MacBook-Pro:temp juan$ python -B timing_mp.py 1000
With multiprocessing: 0.1448049545288086
Without multiprocessing: 0.16153407096862793
Juans-MacBook-Pro:temp juan$ python -B timing_mp.py 5000
With multiprocessing: 2.273800849914551
Without multiprocessing: 3.9749832153320312
Juans-MacBook-Pro:temp juan$ python -B timing_mp.py 10000
With multiprocessing: 8.837619066238403
Without multiprocessing: 15.725339889526367
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

There's an excellent discussion of this in regard to the package emcee by Dan Foreman-Mackey.

If the computation time for the function call isn't relatively large compared to the overhead of multiprocessing you will find no advantage. You can demonstrate this relatively easily with a function call like the following

import time
def func():
    """ arbitrarily time-intensive function """
    time.sleep(1)   # return after 1 s "computation time"
    return
PetMetz
  • 75
  • 1
  • 7