30

I've written a working program in Python that basically parses a batch of binary files, extracting data into a data structure. Each file takes around a second to parse, which translates to hours for thousands of files. I've successfully implemented a threaded version of the batch parsing method with an adjustable number of threads. I tested the method on 100 files with a varying number of threads, timing each run. Here are the results (0 threads refers to my original, pre-threading code, 1 threads to the new version run with a single thread spawned).

0 threads: 83.842 seconds
1 threads: 78.777 seconds
2 threads: 105.032 seconds
3 threads: 109.965 seconds
4 threads: 108.956 seconds
5 threads: 109.646 seconds
6 threads: 109.520 seconds
7 threads: 110.457 seconds
8 threads: 111.658 seconds

Though spawning a thread confers a small performance increase over having the main thread do all the work, increasing the number of threads actually decreases performance. I would have expected to see performance increases, at least up to four threads (one for each of my machine's cores). I know threads have associated overhead, but I didn't think this would matter so much with single-digit numbers of threads.

I've heard of the "global interpreter lock", but as I move up to four threads I do see the corresponding number of cores at work--with two threads two cores show activity during parsing, and so on.

I also tested some different versions of the parsing code to see if my program is IO bound. It doesn't seem to be; just reading in the file takes a relatively small proportion of time; processing the file is almost all of it. If I don't do the IO and process an already-read version of a file, I adding a second thread damages performance and a third thread improves it slightly. I'm just wondering why I can't take advantage of my computer's multiple cores to speed things up. Please post any questions or ways I could clarify.

dpitch40
  • 2,621
  • 7
  • 31
  • 44
  • 4
    The GIL is probably at fault here. You may look into the multiprocessing module, as an alternative to the threading module, as it achieves true concurrency where the GIL will prevent it for threading. – g.d.d.c Jul 25 '11 at 19:50
  • 2
    Have a look at [this](http://wiki.python.org/moin/GlobalInterpreterLock). You've encountered the *only* thing I hate about Python (well, CPython anyways). – Chris Eberle Jul 25 '11 at 19:50
  • 2
    Multiple cores will show activity, but it's just switching between them - only one Python thread can run at a time. You need multiprocessing: http://docs.python.org/dev/library/multiprocessing – Thomas K Jul 25 '11 at 19:51
  • 1
    Your program could actually show an improvement in speed if it *were* I/O-bound, as I/O is one time when CPython lets other threads run. – Ignacio Vazquez-Abrams Jul 25 '11 at 19:53
  • 2
    I'll look into using multiprocessing; I'm running Python 2.4 so I'll need to upgrade first which was why threading interested me. I thought multiprocessing was just a higher-level shell around threading/thread. What's the point of threading, then? And I'm still not sure I understand why multiple threads would _slow down_ my program--is that just the thread overhead? – dpitch40 Jul 25 '11 at 20:27
  • Can't you do the parsing in C? If things are CPU-bound in an interpreter, they cry out for a compiler language. A program that just parses should be I/O bound, not CPU bound. – Mike Dunlavey Jul 25 '11 at 20:28
  • The binary files are fairly complex and I've written some moderately complex objects to hold the data so it can easily be referred to later. Not sure I could learn enough C/Python integration in the next four weeks (before my internship is over) to handle that. – dpitch40 Jul 25 '11 at 20:52
  • @dpitch40: Threading is still useful if you've got something like blocking HTTP requests. And it uses a bit less memory. But nowadays multiprocessing sees more use. – Thomas K Jul 25 '11 at 21:58
  • possible duplicate of [How efficient is threading in Python?](http://stackoverflow.com/questions/5128072/how-efficient-is-threading-in-python) – Warren Dew Sep 18 '15 at 20:01

2 Answers2

45

This is sadly how things are in CPython, mainly due to the Global Interpreter Lock (GIL). Python code that's CPU-bound simply doesn't scale across threads (I/O-bound code, on the other hand, might scale to some extent).

There is a highly informative presentation by David Beazley where he discusses some of the issues surrounding the GIL. The video can be found here (thanks @Ikke!)

My recommendation would be to use the multiprocessing module instead of multiple threads.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 1
    [Here](http://blip.tv/pycon-us-videos-2009-2010-2011/pycon-2010-understanding-the-python-gil-82-3273690) is the video of that presentation. – Ikke Jul 25 '11 at 19:53
  • 2
    The multiprocessing module worked perfectly. Once I got it working I saw exactly the kind of speedup I'd been expecting. Thanks. – dpitch40 Jul 26 '11 at 17:58
  • This comment does not apply to the case for python sharing cpu bound code with c++ code. Code and explanation: https://github.com/PaddlePaddle/Paddle/pull/1364#discussion_r101898833 – Helin Wang Feb 18 '17 at 18:14
9

The threading library does not actually utilize multiple cores simultaneously for computation. You should use the multiprocessing library instead for computational threading.

stefan
  • 1,511
  • 9
  • 13
  • 1
    That first statement is incorrect. It does use multiple cores. Only one at the time can get the GIL. – Ikke Jul 25 '11 at 19:55
  • Ah, I was missing a word. Fixed. – stefan Jul 25 '11 at 19:57
  • 5
    You miss the point. It is not the threading library itself that prevents it. It uses the pthread library, which can use all cores. This would implicate that one can fix the threading library and the problem is solved. But the problem is much deeper than that. – Ikke Jul 25 '11 at 20:00
  • 2
    His statement however is correct -- he doesn't say it couldn't use multiple cores, he said it doesn't. – agf Jul 25 '11 at 20:05
  • 1
    @Ikke: He refers to the Python threading library, not the underlying implementation (about which we need not know anything, and which is not necessarily using the POSIX threading API -- it certainly doesn't use it on Windows!). – Nicholas Knight Jul 25 '11 at 20:05