Python: Why is threaded function slower than non thread

Question

Hello I'm trying to calculate the first 10000 prime numbers.

I'm doing this first non threaded and then splitting the calculation in 1 to 5000 and 5001 to 10000. I expected that the use of threads makes it significant faster but the output is like this:

    --------Results--------
Non threaded Duration:  0.012244000000000005 seconds
Threaded Duration:  0.012839000000000017 seconds

There is in fact no big difference except that the threaded function is even a bit slower.

What is wrong?

This is my code:

import math
from threading import Thread

def nonThreaded():
    primeNtoM(1,10000)


def threaded():
    t1 = Thread(target=primeNtoM, args=(1,5000))
    t2 = Thread(target=primeNtoM, args=(5001,10000))
    t1.start()
    t2.start()
    t1.join()
    t2.join()


def is_prime(n):
    if n % 2 == 0 and n > 2: 
        return False
    for i in range(3, int(math.sqrt(n)) + 1, 2):
        if n % i == 0:
            return False
    return True

def primeNtoM(n,m):
    L = list()
    if (n > m):
        print("n should be smaller than m")
        return
    for i in range(n,m):
        if(is_prime(i)):
                L.append(i)

if __name__ == '__main__':
    import time
    print("--------Nonthreaded calculation--------")
    nTstart_time = time.clock()
    nonThreaded()
    nonThreadedTime = time.clock() - nTstart_time

    print("--------Threaded calculation--------")

    Tstart_time = time.clock()
    threaded()
    threadedTime = time.clock() - Tstart_time

    print("--------Results--------")
    print ("Non threaded Duration: ",nonThreadedTime, "seconds")
    print ("Threaded Duration: ",threadedTime, "seconds")

score 11 · Accepted Answer · edited Apr 28 '14 at 15:33

11

from: https://wiki.python.org/moin/GlobalInterpreterLock

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

This means: since this is CPU-intensive, and python is not threadsafe, it does not allow you to run multiple bytecodes at once in the same process. So, your threads alternate each other, and the switching overhead is what you get as extra time.

edited Apr 28 '14 at 15:33

Daniel Roseman

588,541
66
880
895

answered Apr 28 '14 at 15:26

Luis Masuelli

12,079
10
49
87

your answer is much better than mine, so I deleted :) – Adam Smith Apr 28 '14 at 15:29
Honestly I don't know, but I read there's a `multiprocessing` module. It produces forks and, in unix environments, process forks are just a bit slower than thread forks (context switchs are quicker and lighter). However this would be an issue again in Windows – Luis Masuelli Apr 28 '14 at 15:52

bgporter · Answer 2 · 2014-04-28T15:55:54.563

4

You can use the multiprocessing module, which gives results like below:

('Non threaded Duration: ', 0.016599999999999997, 'seconds')
('Threaded Duration: ', 0.007172000000000005, 'seconds')

...after making just these changes to your code (changing 'Thread' to 'Process'):

import math
#from threading import Thread
from multiprocessing import Process

def nonThreaded():
    primeNtoM(1,10000)


def threaded():
    #t1 = Thread(target=primeNtoM, args=(1,5000))
    #t2 = Thread(target=primeNtoM, args=(5001,10000))
    t1 = Process(target=primeNtoM, args=(1,5000))
    t2 = Process(target=primeNtoM, args=(5001,10000))
    t1.start()
    t2.start()
    t1.join()
    t2.join()

By spawning actual OS processes instead of using in-process threading, you eliminate the GIL issues discussed in @Luis Masuelli's answer.

multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

edited Apr 28 '14 at 15:55

answered Apr 28 '14 at 15:43

bgporter

35,114
8
59
65

But then I'm creating a new process instead of a thread or am I wrong? – arnoapp Apr 28 '14 at 15:45
Exactly -- by creating multiple processes you can e.g. run your code on multiple cores and not be affected by the global interpreter lock. – bgporter Apr 28 '14 at 15:46
remember that, in unix environments, doing a context-switch between processes is not prohibitively harder than doing a thread-switch in the same process. PCBs are light-weight in unix (but not in windows, so it would be again an issue in windows) – Luis Masuelli Apr 28 '14 at 15:55
Ok but in fact that's not the solution I try to achieve. So which kind of code would be suitable for a thread? – arnoapp Apr 28 '14 at 16:08
1

@AzzUrr1 You simply won't be able to get a satisfactory solution using threads in Python. You have to use multiple processes. – dano Apr 28 '14 at 16:21
@AzzUrr1 IO bound operations. All the I/O primitives give up the GIL while they're waiting for system calls to complete, so the GIL is not relevant to I/O performance. – nitely Jul 25 '14 at 03:18

Python: Why is threaded function slower than non thread

2 Answers2

Linked

Related