6

just considering starting to learning python but I have one concern before I invest more time. Let me phrase this as a statement followed by a concern for others to comment on as perhaps the assumptions in the statement are invalid:

I have read about GIL and the consensus seems to be if you require concurrent solutions in python your best bet is to fork a new process to avoid GIL.

My concern is that if I have a problem I'd like to split into N*2 pieces across N processors (assume for example I have a single server running a *nix o/s with say 8 cores) I will incur context switching penalties between processes rather than between threads, which is more costly, which will limit performance.

I ask this because other languages are out there that claim to excel in such a scenario and I wonder is python appropriate for this arena.

Ben Fitzgerald
  • 793
  • 1
  • 6
  • 19
  • 7
    Processes and threads are the same thing on a lot of operating systems. Switching between processes is no more expensive than between threads. In any case, if you have so many context switches that it impacts performance, that either means you have too many threads and they're competing for CPU, or your threads are too interdependent. – Glenn Maynard Jan 27 '10 at 21:53
  • if it had better support for functional programming (decent lambdas...) then it would be a lot more suited to Multithreading IMHO – Gordon Gustafson Jan 27 '10 at 22:01
  • 3
    @Crazy: it has them, use "def" statements. The "lambda" syntax is merely shorthand for certain special cases of defining a function, and the special \_\_name\_\_ attribute on the function object is the biggest difference between "def"' and "lambda". –  Jan 27 '10 at 22:10
  • @Glenn: although context switching may not be more expensive for processes over threads, communication speed between 2 processes is typically slower than between 2 threads, the working set of memory is typically higher, and thus the effectiveness of the memory cache is reduced. These are significant for performance in many cases. – Kylotan Jan 28 '10 at 11:57

7 Answers7

15

multiprocessing can get around the GIL, but it introduces its own issues such as communication between the processes.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
8

Python is not very good for CPU-bound concurrent programming. The GIL will (in many cases) make your program run as if it was running on a single core - or even worse. Even Unladen Swallow will (probably) not solve that problem (quote from their project plan: "we are no longer as optimistic about our chances of removing the GIL completely").

As you already stated, other languages claim to be better in concurrent programming. Haskell, for example, has built-in functionality for programming concurrent applications. You could also try C++ with OpenMP, which I think makes parallelization very simple.

If your application is I/O-bound, Python may be a serious solution as the GIL is normally released while doing blocking calls.

AndiDog
  • 68,631
  • 21
  • 159
  • 205
  • Ok, thanks AndiDog, interesting. Glen Maynard - I note your answer too. Reckon I may learn this as a successor to perl as my org makes python available on most servers and it seems expressive. However I probably won't spend ages learning the finer points and will spend that time on a functional langauge that scales better, eg haskell / erlang. – Ben Fitzgerald Jan 28 '10 at 11:14
  • 1
    If you need raw CPU performance that badly, high-level scripting languages tend to be the wrong approach to begin with. – Glenn Maynard Jan 30 '10 at 01:26
  • Curious to know as to how about Pycuda in case one has access to Nvidia GPUs? – MuneshSingh Dec 25 '19 at 15:07
3

In my limited experience, the "context switch cost" is overrated as a performance limitation.

I/O bandwidth and memory are the most common limiting factors. Python's I/O is comparable to many other languages, since it simply uses the standard C libraries pretty directly.

Your actual problem may not be typical. However, many problems work out really well in multi-processing mode because they're actually I/O bound. Often it's filesystem, web page reading or Database operations that limit performance long before context switches.

S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • You specifically mention using *nix - in which case you should be fine. The context switch cost is minimal, apparently it's a bigger issue on Windows. I routinely run up systems with 1000's of processes (of course not all Python, but that shouldn't matter (TM)) without any problems. Of course, I should qualify the above point, it's on headless Linux systems, configured as servers. No X11, Gnome, KDE, etc. – CyberFonic Feb 09 '10 at 23:44
2

A very glib one-word response: Golang.

William Payne
  • 3,055
  • 3
  • 23
  • 25
1

If you're considering learning Python for addressing this problem, I might suggest taking a look at Erlang instead. It has excellent support for very lightweight processes, and built-in primitives for IPC.

Not to discourage you from learning Python, of course, just suggesting there might be a better tool for this particular task.

TMN
  • 3,060
  • 21
  • 23
1

Also if you are looking at object sharing between the python processes, i suggest you look at the answer by Alex in this question

Community
  • 1
  • 1
TigrisC
  • 1,320
  • 9
  • 11
1

writing codes for multiple process is not an easy task.

But if you start thinking to this way first, eventually, it is easier to scale if one machine isn't enough...Threads can't be used across machines...

mathgl
  • 21
  • 1