1 worker thread faster than 4?

Question

I am reading myself into multighreading in python and came up with this simple test: (btw. this implementation might be very bad, I just wrote that down quickly for testing purpose. Buf if there is something terribly wrong I would be thankful if you could point that out)

#!/usr/bin/python2.7

import threading
import timeit

lst = range(0, 100000)
lstres = []
lstlock = threading.Lock()
lstreslock = threading.Lock()

def add_five(x):
    return x+5

def worker_thread(args):
    print "started"
    while len(lst) > 0:
        lstlock.acquire()
        try:
            x = lst.pop(0)
        except IndexError:
            lstlock.release()
            return
        lstlock.release()
        x = add_five(x)
        lstreslock.acquire()
        lstres.append(x)
        lstreslock.release()

def test():
    try:
        t1 = threading.Thread(target = worker_thread, args = (1,))
        #t2 = threading.Thread(target = worker_thread, args = (2,))
        #t3 = threading.Thread(target = worker_thread, args = (3,))
        #t4 = threading.Thread(target = worker_thread, args = (4,))
        t1.start();
        #t2.start();
        #t3.start();
        #t4.start();
        t1.join();
        #t2.join();
        #t3.join();
        #t4.join();
    except:
        print "Error"

    print len(lstres)

if __name__ == "__main__":
    t = timeit.Timer(test)
    print t.timeit(2)

Despite the terrible example I see the following: one thread is faster than 4. With one thread I get: 13.46 seconds, and with 4 threads: 25.47 seconds.

Is the access to the list by 4 threads a bottleneck thus causing slower times or did I do something wrong?

In python there is not true multithreading.There's a global lock fro thread.Its actually queued.Use multiprocesses — vks, Dec 11 '14 at 13:15
Read [this](https://en.wikipedia.org/wiki/Global_Interpreter_Lock) — VP., Dec 11 '14 at 13:16

score 2 · Accepted Answer · answered Dec 11 '14 at 13:24

2

In your case, the Global Interpreter Lock isn't actually the problem.

Threading doesn't make things faster by default. In your case, the code is CPU bound. No thread is ever waiting for I/O (which allow another to use the CPU). If you have code which needs 100% of the CPU, then threading will only make it faster if a lot of the code is independent which your's isn't: Most of your code is holding locks, so no other thread can proceed.

Which brings us to the cause of the slowdown: Switching threads and fighting for locks costs time. That's what eats 12s in your case.

answered Dec 11 '14 at 13:24

Aaron Digulla

321,842
108
597
820

Can you tell what is then? – ap0 Dec 11 '14 at 13:26
I was expecting that this question would be closed, so I submitted a short version to buy the time to write the explanation. – Aaron Digulla Dec 11 '14 at 13:28
If code is CPU-bound, threading not only won't make it faster, it'll likely slow it down due to the extra overhead using it would involve. – martineau Dec 11 '14 at 14:01
Hmm, I don't really like the explanation "the code is CPU bound". The problem here is that the threads are always waiting for each other, not that the code is not IO-dependant. I would rephrase the answer slightly. – Hannes Ovrén Dec 11 '14 at 15:43
1

For example, if I have the (silly) task "sum all integers from 1..N" I could divide that into partitions, and if the CPU has multiple cores this should decrease computation time (for large enough N) since thread 1 can sum integers 1..K, and thread 2 can take (K+1)..K and so on. This does not require any I/O but is still a parallel problem. – Hannes Ovrén Dec 11 '14 at 15:46
@HannesOvrén: That's why I wrote "threading will only make it faster if a lot of the code is independent" – Aaron Digulla Dec 11 '14 at 16:13
But "independent code" is not a concept that is immediately obvious. I realize that you mean "independent" as in "can execute independently" or "not dependent on results from any other thread", but this is not exactly obvious. Another interpretation of "independent" could be "different piece of code", which of course is irrelevant when it comes to parallell execution, as it is quite common to have different modules running in different threads where one depends on the output from the other. So, while correct, I think rephrasing the answer would be beneficial. – Hannes Ovrén Dec 11 '14 at 16:22
@HannesOvrén: Thank you for your concern. The idea of this site, as I understand it, is to give explicit feedback for the situation at hand while understanding that it's not possible to explain extremely complex issues. You're welcome to post a second answer to give a better explanation. But I feel that you currently put valuable information in comments where they are not easy to find. – Aaron Digulla Dec 11 '14 at 16:26

1 worker thread faster than 4?

1 Answers1