multiprocessing is always worse than single process no matter how many

Question

I am playing around with multiprocessing in Python 3 to try and understand how it works and when it's good to use it.

I am basing my examples on this question, which is really old (2012).

My computer is a Windows, 4 physical cores, 8 logical cores.

First: not segmented data

First I try to brute force compute numpy.sinfor a million values. The million values is a single chunk, not segmented.

import time
import numpy
from multiprocessing import Pool

# so that iPython works
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)"

def numpy_sin(value):
    return numpy.sin(value)

a = numpy.arange(1000000)

if __name__ == '__main__':

    pool = Pool(processes = 8)

    start = time.time()
    result = numpy.sin(a)
    end = time.time()
    print('Singled threaded {}'.format(end - start))
    start = time.time()
    result = pool.map(numpy_sin, a)
    pool.close()
    pool.join()
    end = time.time()
    print('Multithreaded {}'.format(end - start))

And I get that, no matter the number of processes, the 'multi_threading' always takes 10 times or so as much as the 'single threading'. In the task manager, I see that not all the CPUs are maxed out, and the total CPU usage is goes between 18% and 31%.

So I try something else.

Second: segmented data

I try to split up the original 1 million computations in 10 batches of 100,000 each.
Then I try again for 10 million computations in 10 batches of 1 million each.

import time
import numpy
from multiprocessing import Pool

# so that iPython works
__spec__ = "ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>)"

def numpy_sin(value):
    return numpy.sin(value)

p = 3
s = 1000000

a = [numpy.arange(s) for _ in range(10)]

if __name__ == '__main__':

    print('processes = {}'.format(p))
    print('size = {}'.format(s))

    start = time.time()
    result = numpy.sin(a)
    end = time.time()

    print('Singled threaded {}'.format(end - start))

    pool = Pool(processes = p)
    start = time.time()
    result = pool.map(numpy_sin, a)
    pool.close()
    pool.join()
    end = time.time()

    print('Multithreaded {}'.format(end - start))

I ran this last piece of code for different processes p and different list length s, 100000and 1000000.

At least now the task Manager gives the CPU maxed out at 100% usage.

I get the following results for the elapsed times (ORANGE: multiprocess, BLUE: single):

So multiprocessing never wins over the single process.

Why??

Multiprocessing has overhead per work item scheduled and returned. The work work done must be larger than the work-to-get-there in order for MP to be faster. — tdelaney, Apr 29 '20 at 05:19
As explained above, you really need the work to be significant (more than the overhead) to see the improvement. I can see from your graphs that the program runs well below a second. I had a worker working for about 30 sec. for a task and wanted to allow multiple jobs. Instead of waiting 1 min. for 2 jobs to finish, I used MP and 2 jobs take 30 sec.... — Tomerikoo, Apr 29 '20 at 07:00
Another related point that comes to mind, is that (from my experience) the over-head is more or less constant, and not a precentage from the program. So let's say the over-head is 1 sec. For a program of 1 sec., you double the runtime. But for a program of 1 min. it is barely noticable — Tomerikoo, Apr 29 '20 at 10:20

score 3 · Accepted Answer · answered Apr 29 '20 at 06:17

3

Numpy changes how the parent process runs so that it only runs on one core. You can call os.system("taskset -p 0xff %d" % os.getpid()) after you import numpy to reset the CPU affinity so that all cores are used.

See this question for more details

answered Apr 29 '20 at 06:17

RedKnite

1,525
13
26

Why does numpy run on one core? Does it do that all the time, for any function? – SuperCiocia Apr 29 '20 at 06:22
I don't know the details, but from what I've read it has something to do with interactions with OpenBLAS libraries. And it changes how your entire program runs, so even if you don't use numpy at all, just import it, it will still have this effect on your entire program. See: https://stackoverflow.com/questions/15639779/why-does-multiprocessing-use-only-a-single-core-after-i-import-numpy – RedKnite Apr 29 '20 at 06:34

score -4 · Answer 2 · answered Apr 29 '20 at 05:26

A computer can really only do one thing at a time. When multi-threading or multi-processing, the computer is really only switching back and forth between tasks quickly. With the provided problem, the computer could either perform the calculation 1,000,000 times, or split-up the work between a couple "workers" and perform 100,000 for each of 10 "workers".

Multi-processing shines not when computing something straight out, as the computer has to take time to create multiple processes, but while waiting for something. The main example I've heard is for webscraping. If a program requested data from a list of websites and waited for each server to send data before requesting data from the next, the program will have to sit for a couple seconds. If instead, the computer used multiprocessing/threading to ask all the websites first and all concurrently wait, the total running time is much shorter.

OP has 4 physical cores, 8 logical cores. it can do many things at once. — tdelaney, Apr 29 '20 at 05:27

multiprocessing is always worse than single process no matter how many

2 Answers2