python multiprocessing,pathos slow

Question

Today I ran some code and i wanted to run it on my multicore cpu so where even I wrote map I changed it to pool.map. surprisingly my code ran slower even though it was using so much processing power or memory(to my knowledge). so i wrote this test, it uses pathos and multiprocessing.

from pathos.pools import ProcessPool
from pathos.pools import ThreadPool
#from pathos.pools import ParallelPool
from pathos.pools import SerialPool
from multiprocessing import Pool

import time

def timeit(method):
    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()
        print ('%r (%r, %r) %2.2f sec' % \
              (method.__name__, args, kw, te-ts))
        return result

    return timed

def times2(x):
    return 2*x

@timeit
def test(max,p):
    (p.map(times2, range(max)))

def main():
    ppool = ProcessPool(4)
    tpool = ThreadPool(4)
    #parapool = ParallelPool(4)
    spool = SerialPool(4)
    pool = Pool(4)
    for i in range(8):
        max = 10**i
        print(max)
        print('ThreadPool')
        test(max,tpool)
        #print('ParallelPool')
        #test(max,parapool)
        print('SerialPool')
        test(max,spool)
        print('Pool')
        test(max,pool)
        print('ProcessPool')
        test(max,ppool)
        print('===============')


if __name__ == '__main__':
    main()

these are the results

1
ThreadPool
'test' ((1, <pool ThreadPool(nthreads=4)>), {}) 0.00 sec
SerialPool
'test' ((1, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((1, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.17 sec
ProcessPool
'test' ((1, <pool ProcessPool(ncpus=4)>), {}) 0.00 sec
===============
10
ThreadPool
'test' ((10, <pool ThreadPool(nthreads=4)>), {}) 0.00 sec
SerialPool
'test' ((10, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((10, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.00 sec
ProcessPool
'test' ((10, <pool ProcessPool(ncpus=4)>), {}) 0.01 sec
===============
100
ThreadPool
'test' ((100, <pool ThreadPool(nthreads=4)>), {}) 0.00 sec
SerialPool
'test' ((100, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((100, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.00 sec
ProcessPool
'test' ((100, <pool ProcessPool(ncpus=4)>), {}) 0.01 sec
===============
1000
ThreadPool
'test' ((1000, <pool ThreadPool(nthreads=4)>), {}) 0.00 sec
SerialPool
'test' ((1000, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((1000, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.00 sec
ProcessPool
'test' ((1000, <pool ProcessPool(ncpus=4)>), {}) 0.02 sec
===============
10000
ThreadPool
'test' ((10000, <pool ThreadPool(nthreads=4)>), {}) 0.00 sec
SerialPool
'test' ((10000, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((10000, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.00 sec
ProcessPool
'test' ((10000, <pool ProcessPool(ncpus=4)>), {}) 0.09 sec
===============
100000
ThreadPool
'test' ((100000, <pool ThreadPool(nthreads=4)>), {}) 0.04 sec
SerialPool
'test' ((100000, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((100000, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.01 sec
ProcessPool
'test' ((100000, <pool ProcessPool(ncpus=4)>), {}) 0.74 sec
===============
1000000
ThreadPool
'test' ((1000000, <pool ThreadPool(nthreads=4)>), {}) 0.42 sec
SerialPool
'test' ((1000000, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((1000000, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 0.17 sec
ProcessPool
'test' ((1000000, <pool ProcessPool(ncpus=4)>), {}) 7.54 sec
===============
10000000
ThreadPool
'test' ((10000000, <pool ThreadPool(nthreads=4)>), {}) 4.57 sec
SerialPool
'test' ((10000000, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((10000000, <multiprocessing.pool.Pool object at 0x0000011E63D276A0>), {}) 2.25 sec
ProcessPool
'test' ((10000000, <pool ProcessPool(ncpus=4)>), {}) 81.51 sec
===============

as you can see multiprocessing often beats ProcessPool and is even slower than SerialPool. I am running i5-2500 and I installed pathos today via pip

>pip freeze
colorama==0.3.9
decorator==4.1.2
dill==0.2.7.1
helper-htmlparse==0.1
htmldom==2.0
lxml==4.0.0
multiprocess==0.70.5
pathos==0.2.1
pox==0.2.3
ppft==1.6.4.7.1
py==1.4.34
pyfs==0.0.8
pyreadline==2.1
pytest==3.2.2
six==1.11.0

why does this happen?

One thing i am sure of is, the more threading you use, the more time python code takes. latest python have a better version of GIL, so..there might be some performance gain in latest python 3.x versions compared to older ones — Max, Sep 23 '17 at 17:35
also, python does not really use multiple threads. it uses a single thread and the locks are interchanged between the processes — Max, Sep 23 '17 at 17:37

score 1 · Accepted Answer · answered Sep 24 '17 at 08:34

You will only benefit from parallelization with demanding tasks. Your task is quite instantaneous compared to the communications required by the multiprocessing/multithreading code. Try to use a function that last 1s and you will see the effect. Also, remember that in python, due to the GIL you will only benefit from multithreading if your are IO bounded. For CPU bounded tasks go with multiprocessing.

See this talk from Raymond.

score 1 · Answer 2 · answered Apr 09 '20 at 01:49

I wanted to examine this myself to see how it behaves with an actually slow function (sleeping for 1 second).

from pathos.pools import ProcessPool
from pathos.pools import ThreadPool
from pathos.pools import ParallelPool
from pathos.pools import SerialPool
from multiprocessing import Pool
import time

def timeit(method):
    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()
        print ('%r (%r, %r) %2.2f sec' % \
              (method.__name__, args, kw, te-ts))
        return result
    return timed

def slowfcn(n):
    from time import sleep
    sleep(1.0)

@timeit
def test(n,p):
    (p.map(slowfcn, range(n)))

def main():
    npool = 4
    ppool = ProcessPool(npool)
    tpool = ThreadPool(npool)
    parapool = ParallelPool(npool)
    spool = SerialPool()
    pool = Pool(npool)

    nloops = 8
    print('For Loop')
    ts = time.time()
    for i in range(nloops):
        slowfcn(i)
    te = time.time()
    print ('%r () %2.2f sec' % ('test', te-ts))
    print('ThreadPool')
    test(nloops,tpool)
    print('ParallelPool')
    test(nloops,parapool)
    print('SerialPool')
    test(nloops,spool)
    print('Pool')
    test(nloops,pool)
    print('ProcessPool')
    test(nloops,ppool)


if __name__ == '__main__':
    main()

Here are the results:

For Loop
'test' () 8.00 sec
ThreadPool
'test' ((8, <pool ThreadPool(nthreads=4)>), {}) 2.00 sec
ParallelPool
'test' ((8, <pool ParallelPool(ncpus=4, servers=None)>), {}) 8.01 sec
SerialPool
'test' ((8, <pool SerialPool()>), {}) 0.00 sec
Pool
'test' ((8, <multiprocessing.pool.Pool state=RUN pool_size=4>), {}) 2.00 sec
ProcessPool
'test' ((8, <pool ProcessPool(ncpus=4)>), {}) 2.01 sec

So while ThreadPool, Pool, and ProcessPool all use threading rather than parallel processing, it looks like python spreads these threads across your cpu cores so you actually do get a speedup. Additionally, ParallelPool needs to have a server configured, but it's not clear to me from the documentation or examples how to do that. It's not clear to me that SerialPool is doing anything here, and I'm not sure how to fix that either.

python multiprocessing,pathos slow

2 Answers2

Linked