19

I'm trying to figure out multi-threading programming in python. Here's the simple task with which I want to compare serial and parallel speeds.

import threading
import Queue
import time
import math

def sinFunc(offset, n):
  result = []
  for i in range(n):
    result.append(math.sin(offset + i * i))
  return result

def timeSerial(k, n):
  t1 = time.time()    
  answers = []
  for i in range(k):
    answers.append(sinFunc(i, n))
  t2 = time.time()
  print "Serial time elapsed: %f" % (t2-t1)

class Worker(threading.Thread):

  def __init__(self, queue, name):
    self.__queue = queue
    threading.Thread.__init__(self)
    self.name = name

  def process(self, item):
    offset, n = item
    self.__queue.put(sinFunc(offset, n))
    self.__queue.task_done()
    self.__queue.task_done()

  def run(self):
    while 1:
        item = self.__queue.get()
        if item is None:
            self.__queue.task_done()
            break
        self.process(item)

def timeParallel(k, n, numThreads):
  t1 = time.time()    
  queue = Queue.Queue(0)
  for i in range(k):
    queue.put((i, n))
  for i in range(numThreads):
    queue.put(None)    
  for i in range(numThreads):
    Worker(queue, i).start()
  queue.join()
  t2 = time.time()
  print "Serial time elapsed: %f" % (t2-t1)

if __name__ == '__main__':

  n = 100000
  k = 100
  numThreads = 10

  timeSerial(k, n)
  timeParallel(k, n, numThreads)

#Serial time elapsed: 2.350883
#Serial time elapsed: 2.843030

Can someone explain to me what's going on? I'm used to C++, and a similar version of this using the module sees the speed-up we would expect.

andyInCambridge
  • 1,215
  • 2
  • 13
  • 27
  • Threads are not magic devices that you add to the app and it will get faster. Mind the cost of making a thread is *not* negligible e.g. usually 1MB of Virtual Address Space is used just at the creation. – Lukasz Madon May 28 '12 at 18:45
  • 1
    Perhaps, but that isn't the issue here. The OP is running headfirst into the GIL, a problem typically solved by using either the `multiprocessing` module or by using Stackless Python instead of the default CPython interpreter. – Chinmay Kanchi May 28 '12 at 23:10

3 Answers3

30

Other answers have referred to the issue of the GIL being the problem in cpython. But I felt there was a bit of missing information. This will cause you performance issues in situations where the code you are running in threads is CPU bound. In your case here, yes doing many calculations in threads is going to most likely result in dramatically degraded performance.

But, if you were doing something that was more IO bound, such as reading from many sockets in a network application, or calling out to subprocess, you can get performance increases from threads. A simple example for your code above would be to add a stupidly simple call out to the shell:

import os

def sinFunc(offset, n):
  result = []
  for i in xrange(n):
    result.append(math.sin(offset + i * i))
  os.system("echo 'could be a database query' >> /dev/null; sleep .1")
  return result

That call might have been something real like waiting on the filesystem. But you can see that in this example, threading will start to prove beneficial, as the GIL can be released when the thread is waiting on IO and other threads will continue to process. Even so, there is still a sweet spot for when more threads start to become negated by the overhead of creating them and synchronizing them.

For CPU bound code, you would make use of multiprocessing

From article: http://www.informit.com/articles/article.aspx?p=1850445&seqNum=9

...threading is more appropriate for I/O-bound applications (I/O releases the GIL, allowing for more concurrency)...

Similar question references about threads vs processes:
https://stackoverflow.com/a/1227204/496445
https://stackoverflow.com/a/990436/496445

Community
  • 1
  • 1
jdi
  • 90,542
  • 19
  • 167
  • 203
7

Python has a severe threading problem. Basically, adding threads to a Python application almost always fails to make it faster, and sometimes makes it slower.

This is due to the Global Interpreter Lock, or GIL.

Here's blog post about it that includes a talk on the subject.

One way to bypass this limitation is to use processes instead of threads; this is made easier by the multiprocessing module.

Zeugma
  • 31,231
  • 9
  • 69
  • 81
cha0site
  • 10,517
  • 3
  • 33
  • 51
  • 9
    Do you think its really correct to say "almost always fails to make it faster"? Isn't it completely dependent on whether the application is IO or cpu bound? I feel a blanket statement is misleading. – jdi May 28 '12 at 18:49
  • 1
    @jdi: This isn't some intrinsic property of threads, this is Python-specific. In Python, you have the GIL, which is a monolithic lock that you need to grab before you can touch any Python object. So, almost every use of threads in Python will be serialized anyway, neglecting any performance benefits but still costing the overhead of threads. The talk I linked to shows some additional ways in which threads can slow down Python due to the GIL. There are some edge cases where threads work, but all of those I know about involve external C modules doing fancy things. – cha0site May 28 '12 at 18:53
  • 2
    I understand the GIL issue. I'm saying that if the work being done in the threads is IO bound, then it will suit a threaded approach because the GIL can release often. And I was referring to it being a blanket python threading statement. – jdi May 28 '12 at 19:00
  • @jdi: See this slideshow: http://www.dabeaz.com/python/GIL.pdf I know some of those problems were fixed, but not all. Note how it gets worse as you add more processors. – cha0site May 28 '12 at 19:07
  • 2
    Yes there are problems. However, does that mean threads will "almost always" hurt, even judicious use of I/O-bound threads? –  May 28 '12 at 19:09
  • AFAIK, yes. Try it out and see for yourself. – cha0site May 28 '12 at 19:11
  • @cha0site: I think you are under-informed about this. Plenty of applications can make use of threading if they aren't just blowing through tons of calculations in the threads. – jdi May 28 '12 at 19:19
  • 1
    @cha0site It's not Python specific, it's CPython specific. – mutantacule May 28 '12 at 21:42
  • @kosii: You're right, Jython and IronPython don't have a GIL and don't suffer from this. pypy does have a GIL and does suffer from this problem, but they have a project going to implement STM (software transactional memory), which will replace the GIL with much finer "locking" (air-quoted because the whole point is that it's not a lock) and solve this problem. – cha0site May 29 '12 at 08:00
  • @jdi: I might be, hell, I *want* to be wrong about this. I did tests in a previous workplace on a reactor-based (like twisted, but I had my own reactor implementation) vs. thread-based I/O implementation. This was CPython 2.5. The reactor was faster, even though I tested on a multicore machine. Also, even though I can't back it up, I believe that a multi-process solution will be faster than a multi-threaded one in Python, but of course you'll have to do some IPC. Again, I'd love to be proved wrong about this, if you have any data showing otherwise, please share. – cha0site May 29 '12 at 08:15
  • 1
    "adding threads to a Python application almost always fails to make it faster, and sometimes makes it slower" as a general statement to the question is extremely misleading or even wrong. Threading is useful for introducing concurrency. Without concurrency, many applications can't be designed in a usable way. Hence, threading makes applications work at all. In most cases, it's not about making the application faster. – Dr. Jan-Philip Gehrcke May 29 '12 at 17:27
  • @Jan-Philip: Disagree. Your comment makes it sound as if threading is the only way to introduce concurrency ("makes applications work at all"), and that's simply wrong. You have Async I/O, reactors, worker processes, user threads (== what stackless python does), and more. They do have some implications on the design, but they bypass the GIL, allowing for actual concurrency, and because of that are generally faster - in Python, at least. – cha0site May 30 '12 at 08:28
  • 1
    Of course there are different techniques for introducing concurrency in your code. I agree that greenlets are a very neat method. I also agree that due to the GIL, threading in CPython behaves different than in other languages/language implementations. However, your answer is not appropriate as has been pointed out by jdi and delnan. The main problem with your answer is that it is overgeneralized. It may be true for some cases but it is not correct for all scenarios. – Dr. Jan-Philip Gehrcke May 30 '12 at 13:27
0

Python libraries that are written in C can obtain/release the Global Interpreter Lock (GIL) at will. Those that do not use Python objects can release the GIL so that other threads can get a look-in, however I believe that the math library uses Python Objects all the time, so effectively math.sin is serialised. Since locking/unlocking is an overhead, it is not unusual for Python threads to be slower than processes.

cdarke
  • 42,728
  • 8
  • 80
  • 84