3

A few hours ago, I asked a question about Python multithreading. To understand how it works, I have performed some experiments, and here are my tests:


Python script which uses threads:

import threading
import Queue
import time

s = 0;

class ThreadClass(threading.Thread):

    lck = threading.Lock()

    def __init__(self, inQ, outQ):
        threading.Thread.__init__(self)
        self.inQ = inQ
        self.outQ = outQ

    def run(self):
        while True:
            global s
            #print self.getName()+" is running..."
            self.item = self.inQ.get()
            #self.inQ.task_done()
            ThreadClass.lck.acquire()
            s += self.item
            ThreadClass.lck.release()
            #self.inQ.task_done()
            self.outQ.put(self.item)
            self.inQ.task_done()

inQ = Queue.Queue()
outQ = Queue.Queue()

i = 0
n = 1000000

print "putting items to input"
while i<n:
    inQ.put(i)
    i += 1

start_time = time.time()
print "starting threads..."
for i in xrange(10):
    t = ThreadClass(inQ, outQ);
    t.setDaemon(True)
    t.start()


inQ.join()
end_time = time.time()
print "Elapsed time is: %s"%(end_time - start_time)
print s

The following has the same functionality with a simple while loop:

import Queue
import time

inQ = Queue.Queue()
outQ = Queue.Queue()

i = 0
n = 1000000
sum = 0

print "putting items to input"
while i<n:
    inQ.put(i)
    i += 1

print "while loop starts..."
start_time = time.time()
while inQ.qsize() > 0:
    item = inQ.get()
    sum += item
    outQ.put(item)
end_time = time.time()

print "Elapsed time is: %s"%(end_time - start_time)
print sum

If you run these programs on your machine, you can see that threads are much slower than a simple while loop. I am a bit confused about threads and want to know what is wrong with the threaded code. How can I optimize it (in this situation), and why it is slower than the while loop?

Noctis Skytower
  • 21,433
  • 16
  • 79
  • 117
torayeff
  • 9,296
  • 19
  • 69
  • 103
  • 1
    What was unclear about the answers to your previous questions? Summarized: Threads won't make everything faster (certainly not if you're doing as little as here and spend most of your time locking and in the queue), and even if it was, CPython threads wouldn't use more than one core at any given point in time. –  May 29 '12 at 17:01
  • my previous question was about finding bug, this is about performance – torayeff May 29 '12 at 17:04
  • Oh, I'm sorry, I confused you with another user who posted a quite similar question yesterday, see [python multi-threading slower than serial?](http://stackoverflow.com/q/10789042/395760) which may very well answer your question (I'll vote to close as duplicate if someone confirms my impression). –  May 29 '12 at 17:05
  • 4
    @torayeff: Simply put, your threaded code is doing more things than the unthreaded code to perform the same task. Not everything can be optimized by using threading, and threading is not used solely for optimization. – Joel Cornett May 29 '12 at 17:06
  • @JoelCornett: thx, for answer. I wanted to know this thing: do you noticed my comments of task_done(), if I comment my existing task_done() and uncomment the first one, what will happen, will be s+=self.item be calculated, or thread will go to sleep mode??? This confuses me – torayeff May 29 '12 at 17:16
  • 1
    @torayeff: `task_done()` modifies the counter used by the `Queue.join()` method. `Queue.join()` will block until the task counter drops to zero. If you comment out that line, all of your threads will continue to run/finish, but your main script will block forever at the point where you call `inQ.join()`. – Joel Cornett May 29 '12 at 17:28

1 Answers1

2

threading is always tricky, by threading in Python is special.

To discuss optimization, you have to focus on special cases, otherwise there is no single answer. The initial thread solution on my computer runs on 37.11 s. If you use a local variable to sum the elements of each thread and then lock only at the end, the time drops to 32.62s.

Ok. The no thread solution runs on 7.47 s. Great. But if you want to sum a ton of numbers in Python, you just use the built in function sum. So, if we use a List with no threads and the sum built in, the time drops to 0.09 s. Great!

Why?

Threads in Python are subject to the Global Interpreter Lock (GIL). They will never run Python code in parallel. They are real threads, but internally, they are only allowed to run X Python instructions before releasing the GIL to another thread. For very simple calculations, the cost of creating a thread, locking and context switching is much bigger than the cost of your simple computation. So in this case, the overhead is 5 times bigger than the computation itself. Threading in Python is interesting when you can't use async I/O or when you have blocking functions that should run at the same time.

But, why the sum built in is faster than the Python no thread solution? The sum built in is implemented in C, and Python loops suck performance wise. So it is much faster to iterate all elements of the list using the built in sum.

Is it always the case? No, it depends on what you are doing. If you were writing these numbers to n different files, the threading solution could have a chance, as the GIL is released during I/O. But even then, we would need to check if I/O buffering/disk sync time would not be game changers. This kind of detail makes a final answer very difficult. So, if you want to optimize something, you must have exactly what you have to optimize. To sum a list of numbers in Python, just use the sum built in.

nmenezes
  • 910
  • 6
  • 12