Python threading vs. multiprocessing in Linux

Question

Based on this question I assumed that creating new process should be almost as fast as creating new thread in Linux. However, little test showed very different result. Here's my code:

from multiprocessing import Process, Pool
from threading import Thread

times = 1000

def inc(a):
    b = 1
    return a + b

def processes():
    for i in xrange(times):
        p = Process(target=inc, args=(i, ))
        p.start()
        p.join()

def threads():
    for i in xrange(times):
        t = Thread(target=inc, args=(i, ))
        t.start()
        t.join()

Tests:

>>> timeit processes() 
1 loops, best of 3: 3.8 s per loop

>>> timeit threads() 
10 loops, best of 3: 98.6 ms per loop

So, processes are almost 40 times slower to create! Why does it happen? Is it specific to Python or these libraries? Or did I just misinterpreted the answer above?

UPD 1. To make it more clear. I understand that this piece of code doesn't actually introduce any concurrency. The goal here is to test the time needed to create a process and a thread. To use real concurrency with Python one can use something like this:

def pools():
    pool = Pool(10)
    pool.map(inc, xrange(times))

which really runs much faster than threaded version.

UPD 2. I have added version with os.fork():

for i in xrange(times):
    child_pid = os.fork()
    if child_pid:
        os.waitpid(child_pid, 0)
    else:
        exit(-1)

Results are:

$ time python test_fork.py 

real    0m3.919s
user    0m0.040s
sys     0m0.208s

$ time python test_multiprocessing.py 

real    0m1.088s
user    0m0.128s
sys     0m0.292s

$ time python test_threadings.py

real    0m0.134s
user    0m0.112s
sys     0m0.048s

Well, the question you linked to is comparing the cost of just calling `fork(2)` vs. `pthread_create(3)`, whereas your code does quite a bit more. How about comparing `os.fork()` with `thread.start_new_thread()`? — Aya, Jul 02 '13 at 13:06
@Aya: I couldn't find any kind of `join` in `thread` module to create similar test, but even compared to high-level `threading` version with `os.fork()` is still much slower. In fact, it is the slowest one (though additional conditions may affect performance). See my update. — ffriend, Jul 02 '13 at 14:11
You have to use a mutex to wait for the thread if you're using the low-level `thread` module, which is how the higher-level `threading` module implements `join()`. But, if you're just trying to measure the time it takes to create the new process/thread, then you shouldn't be calling `join()`. See also my answer below. — Aya, Jul 02 '13 at 14:19

Aya · Accepted Answer · 2013-07-02T14:45:27.300

5

The question you linked to is comparing the cost of just calling fork(2) vs. pthread_create(3), whereas your code does quite a bit more, e.g. using join() to wait for the processes/threads to terminate.

If, as you say...

The goal here is to test the time needed to create a process and a thread.

...then you shouldn't be waiting for them to complete. You should be using test programs more like these...

fork.py

import os
import time

def main():
    for i in range(100):
        pid = os.fork()
        if pid:
            #print 'created new process %d' % pid
            continue
        else:
            time.sleep(1)
            return

if __name__ == '__main__':
    main()

thread.py

import thread
import time

def dummy():
    time.sleep(1)

def main():
    for i in range(100):
        tid = thread.start_new_thread(dummy, ())
        #print 'created new thread %d' % tid

if __name__ == '__main__':
    main()

...which give the following results...

$ time python fork.py
real    0m0.035s
user    0m0.008s
sys     0m0.024s

$ time python thread.py
real    0m0.032s
user    0m0.012s
sys     0m0.024s

...so there's not much difference in the creation time of threads and processes.

edited Jul 02 '13 at 14:45

answered Jul 02 '13 at 14:11

Aya

39,884
6
55
55

But won't your `fork.py` just create new threads and exit, without waiting for child processes to complete? – ffriend Jul 02 '13 at 14:14
Also, you launch next thread/process without waiting for previous to finish, so they run concurrently, while it seems to be more correct to start them sequentially to avoid GIL and all such things. – ffriend Jul 02 '13 at 14:20
@ffriend Well, your question said (emphasis mine) "I assumed that **creating** new process should be almost as fast as creating new thread in Linux" , and it is. The whole point of using threads is for concurrency, so what would be the point of running threads sequentially? What exactly are you trying to achieve here? – Aya Jul 02 '13 at 14:23
I'm trying to compare overhead for running new thread and new process. I emphasized creation to separate thread/process from other details like GIL, function calls, etc. But of course, joining it back also matters. Running many threads/processes sequentially is just another way to find out mean time. See my first update for details. – ffriend Jul 02 '13 at 15:47
@ffriend Well, if you include the tear down time, then processes take quite a bit longer than threads, but the overhead is still in the millisecond range, either way. However, in practice, if the amount of time it takes to set up and tear down a process/thread is greater than the amount of time the process/thread is working for, then there's not much point in using them. Otherwise, the overhead is irrelevant, and choosing between the two should be based on which is more appropriate for the actual goal you're trying to accomplish. – Aya Jul 02 '13 at 16:24
@ffriend Also, given that the linked question is only measuring the the set up time, and ignoring the tear down time, then that would explain the discrepancy between your results, and the results from the examples in this answer. I thought your question was to explain this discrepancy, which I thought I had. So if that isn't your question, then what is? – Aya Jul 02 '13 at 16:29
Just as an example, consider application or framework that creates lots of threads/processes all the time, something similar to what Erlang does. If processes were really lightweight, you could want to use them instead of threads. But if you get (relatively) large overhead, it would be better to put more effort on threads instead. Of course, there are ways to overcome any problems, but it's worse to know such details beforehand. Also, don't forget about GIL which is all about threads vs. processes. Anyway, answers and comments clarified it, so I accept your answer as the most detailed. Thanks. – ffriend Jul 02 '13 at 21:38
@ffriend I see. Well, the creation of both threads and subprocesses have a fairly significant overhead, so in practice, I'd probably try to avoid creating an "application or framework that creates lots of threads/processes all the time", and use a pool-based model instead. The GIL is kind of a pain with Python, but you can avoid it by using subprocesses with IPC, or using threads which call into a C library (e.g. with `ctypes`) to do most of the work. – Aya Jul 03 '13 at 13:28

score 2 · Answer 2 · answered Jul 02 '13 at 13:04

2

Yes, it is true. Starting a new process (called a heavyweight process) is costly.

As an overview ...

The OS has to (in the linux case) fork the first process, set up the accounting for the new process, set up the new stack, do the context switch, copy any memory that gets changed, and tear all that down when the new process returns.

The thread just allocates a new stack and thread structure, does the context switch, and returns when the work is done.

... that's why we use threads.

answered Jul 02 '13 at 13:04

andy256

2,821
2
13
19

you got it backwards. a process is just a process. a thread is a lightweight process :) i guess you can call a process a heavyweight thread, but i don't think anyone does that. what is a heavyweight process? – thang Mar 27 '17 at 19:37
@thang Sigh. If you don't know sth, then at least you could Google it. Try Googling "heavyweight process" and see if *anyone does that*. – andy256 Mar 27 '17 at 22:29

score 1 · Answer 3 · answered Jul 02 '13 at 13:30

In my experience there is a significant difference between creating a thread (with pthread_create) and forking a process.

For example I created a C test similar to your python test with thread code like this:

pthread_t thread; 
pthread_create(&thread, NULL, &test, NULL); 
void *res;
pthread_join(thread, &res);

and process forking code like this:

pid_t pid = fork();
if (!pid) {
  test(NULL);
  exit(0);
}         
int res;
waitpid(pid, &res, 0);

On my system the forking code took about 8 times as long to execute.

However, it's worth noting that the python implementation is even slower - for me it was about 16 times as slow. I suspect that is because in addition to the regular overhead of creating a new process, there is also more python overhead associated with the new process too.

Python threading vs. multiprocessing in Linux

3 Answers3

Linked