7

In my little understanding, it is the performance factor that drives programming for multi-threading in most cases but not all. (irrespective of Java or Python).

I was reading this enlightening article on GIL in SO. The article summarizes that python adopts GIL mechanism; i.e only a single Thread can execute python byte code at any given time. This makes single thread application really faster.

My question is as follows:

Since if only one Thread is served at a given point, does multiprocessing or thread module provides a way to overcome this limitation imposed by GIL? If not, what features does they provide for doing a real multi-task work

There was a question asked in the comments section of the above post in the accepted answer,but no answer has been made? I had this question in my mind too

^so at any time point of time, only one thread will be serving content to client... 
so no point of actually using multithreading to improve performance. right?
Community
  • 1
  • 1
brain storm
  • 30,124
  • 69
  • 225
  • 393
  • 3
    Short answer: If your code is mostly *waiting* (for responses from the network for example), **multithreading** will work just fine to parallelize that waiting. If you're doing heavy *computation* and want to leverage all those cores, **multiprocessing** is what you need. – Lukas Graf Jul 14 '14 at 19:55

4 Answers4

13

You're right about the GIL, there is no point to use multithreading to do CPU-bound computation, as the CPU will only be used by one thread.

But that previous statement may have enlighted you: If your computation is not CPU bound, you may take advantage of multithreading.

A typical example is when your application take most of its time waiting for something.

One of many many examples of not-CPU bound program: Say you want to build a web crawler, you have to crawl many many websites, and store them in a database, what does cost times ? Waiting for the servers to send data, actually downloading the data, and storing it in the database, nothing CPU bound here. Here you may get a faster crawler using a pool of crawlers instead of one single crawler. Typically in the case one website is almost down and very slow to respond (~30s), during this time, a single-threaded application will wait for the website, you're stuck. In a multithreaded application, other threads will continue crawling, and that's cool.

On the other hand, as there is one GIL per process, you may use multiprocessing to do CPU-bound computation.

As a side note, it exists some more or less partial implementations of Python without the GIL, I'd like to mention one that I think is in a great way to achieve something cool: pypy STM. You'll easily find, searching "get rid of the GIL" a lot of threads about the subject.

Julien Palard
  • 8,736
  • 2
  • 37
  • 44
  • does inter-process communication happen in multi-processing? how is the state of shared objects preserved or accessed in multiprocessing? – brain storm Jul 14 '14 at 20:04
  • RTFM: https://docs.python.org/2/library/multiprocessing.html there is sections about exchanging objects, synchronisation, and sharing state. – Julien Palard Jul 14 '14 at 20:07
  • "Waiting for the servers to send data, actually downloading the data, and storing it in the database, nothing CPU bound here" -- does that mean I can have a computer with just RAM and connect to internet and browse? sorry for the stupid question. but I would like to know the distinction here – brain storm Jul 14 '14 at 20:08
  • You can't have a computer without a CPU. It'd be like a car without an engine or motor. What he means is that your CPU won't be the bottleneck. – Sohcahtoa82 Jul 14 '14 at 20:21
  • 3
    Just to make it explicit: I/O-bound operations can actually run concurrency across threads because Python will release the GIL while any type of blocking I/O operation is running. The only exception to this rule would be if a poorly-written C-extension fails to release the GIL while it does blocking I/O. In that case, you'll be stuck in the thread running the I/O until the I/O completes. – dano Jul 14 '14 at 20:41
  • 2
    @brainstorm The `multiprocessing` module is capable of sharing objects between processes, but there's a higher cost to do that than there would be with threads. The objects being sent between processes need to be pickled, sent to the other process via a socket, then unpickled on the other side. This is mostly invisible to you as a client of the library, but it is much slower than shared state between threads in a single process. There are some other options for sharing state (shared memory with `ctypes`, `multiprocessing.Manager`) but those have some drawbacks as well. – dano Jul 14 '14 at 20:46
2

Multiprocessing side-steps the GIL issue because code runs in a separate process while the GIL is only concerned with a single process. Within a process, multithreading may be faster to the extent that threads are waiting for some relatively slow resource like the disk or network.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • correct me: so with Multiprocessing, same bytecode can executed in a different processes. But there cannot be shared objects between these process I guess. (If so, where is the references held). – brain storm Jul 14 '14 at 20:00
  • With multiprocessing you can achieve concurrency because there's one python interpreter for each process you spawn. You can share data between processes (for example, using Queues), see https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes. – cdonts Jul 14 '14 at 20:05
  • You can share informations between processes: https://docs.python.org/2/library/multiprocessing.html#exchanging-objects-between-processes – Julien Palard Jul 14 '14 at 20:05
  • multiprocessing is complex - you need to read the docs, but as others say, there are several ways to share data. mp works differently on linux and windows and only runs faster if processing time is significantly greater than the time to transfer data. – tdelaney Jul 14 '14 at 20:14
1

A quick google search yielded this informative slideshow. http://www.dabeaz.com/python/UnderstandingGIL.pdf

But what it fails to present it the fact that all threads are contained within a process. And a process by default can only run on one CPU (or core). So while the GIL on a per process basis does manage the threads in said process and doesn't always deliver the expected performance, it should at large scales perform better than single threaded operations.

Andrew
  • 114
  • 3
1

GIL is always a hot topic in python but usually meaningless. It makes most programs much more safe. If you want real computational performance, try PyOpenCL. Any modern real-world high performance number crunching should be done on GPUs (also openCL runs happily on CPUs). It has no GIL issues.

If you want to do multithreading in python to improve I/O bound performance, GIL is not an issue there.

Lastly if you want to utilize multiple CPUs to increase performance of your pure number crunching, and in a pythonic fashion, use multiprocessing.

But its still not as fast as coding your multithreaded application in assembly. Good luck not making typos.

beiller
  • 3,105
  • 1
  • 11
  • 19