7

All is entirely theoretical, the question just came to mind and I wasn't entirely sure whats the answer:

Assume you have an application that calculates 4 independent calculations. (Totally independent, doesn't matter what order you do them and you don't need one to calculate another). Also assume those calculations are long (minutes) and CPU-bound (not waiting for any kind of IO)

1) Now, if you have a 1-processor computer, a single thread application will logically be faster than (or the same as) a multithreaded application. As the computer not able to do more then one thing at a time with one processor, it would "waste" time on context switching and the likes. So far so good?

2) If you have a 4 processor computer, 4 threads will mostly likely be faster for this than single thread. Right? your computer can now do 4 operations at a time so its just logical to divide your application to 4 threads, and it should complete with the time the longest of the 4 calculations take. Still good so far?

3) And now the actual part I am confused about - why would I EVER have my application create more threads than the number of processors (well actually - cores) available? I have programmed and have seen applications that create tens and hundreds of threads, but actually - the perfect number is about 8 for an average computer?

P.S. I already read this: Threading vs single thread but didn't quiet answer that.

Cheers

Abdullah Khan
  • 12,010
  • 6
  • 65
  • 78
AlexD
  • 4,062
  • 5
  • 38
  • 65
  • Similar question: http://stackoverflow.com/questions/503551/does-it-make-sense-to-spawn-more-than-one-thread-per-processor – user2314737 Sep 11 '14 at 14:36

4 Answers4

5

Why would I EVER have my application create more threads than the number of processors (well actually - cores) available?

One very good reason is if you have threads that wait on events. For example you might have a producer/consumer application in which the producer is reading from some data stream, and that data arrives in bursts: a few hundred (or thousand) records in a batch, followed by nothing for a while, and then another burst. Say you have a 4-core machine. You could have a single producer thread that reads the data and places it in a queue, and three consumer threads to process the queue.

Or, you could have a single producer thread and four consumer threads. Most of the time, the producer thread is idle, giving you four consumer threads to process items from the queue. But when items are available on the data stream, one of the consumer threads gets swapped out in favor of the producer.

That's a simplified example, but substantially similar to programs that I have in production.

More generally, it doesn't make any sense to create more continuously-working (i.e. CPU bound) threads than you have processing units (CPU cores in general, although the existence of hyperthreading muddies the waters a bit). If you know that your threads won't be waiting on external events, then having n+1 threads when you only have n cores will end up wasting time with thread context switches. Note that this is strictly in the context of your program. If there are other applications and OS services running, your application's threads will get swapped out from time to time so that those other apps and services can get a timeslice. But one assumes that, if you're running a CPU-intensive program, you'll limit the other apps and services that are running at the same time.

Your best bet, of course, is to set up a test. On a 4-core machine, test your app with 1, 2, 3, 4, 5, ... threads. Time how long it takes to complete with different numbers of threads. I think you'll find that on a 4-core machine the sweet spot will be 3 or 4; most likely 4 unless there are other apps or OS services that take a lot of CPU.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • I'll redirect the question I asked James Baxter: Assume there is unbounded-buffer or just my programs are CPU-bound (I understand that's never the case in real life applications). In that imaginary case and assuming I am the only application running, is there any sense in creating more threads than cores? – AlexD Sep 11 '14 at 14:48
1

I think you are assuming that all programs are CPU bound - remember some of your threads will be waiting for I/O (disk/network/user traffic).

James Baxter
  • 1,237
  • 9
  • 17
  • You are entirely correct, assume those calculations are CPU-bound and not IO-bound, what then? – AlexD Sep 11 '14 at 14:31
  • 1
    In which case you would benefit from having at least as many cores as threads to avoid contention / context switching. The problem is there will be other processes on the PC like the OS that will want to utilize the CPU so it can't be avoided completely. – James Baxter Sep 11 '14 at 14:33
  • 1
    I think this would be better phrased as just "I/O". In particular, CPUs are enough faster than memory that it can make sense to think of a read from main memory as an I/O operation, and try to give the CPU something to do while memory responds. This is particularly true with hyperthreading, where you can have another thread ready to run very quickly. – Jerry Coffin Sep 11 '14 at 14:36
  • If you take what you said a little bit further - if I create 4 threads on a 4 core computer, the OS might decide to let me use just, lets say, 1-2 cores and use the others for her own doing. But if I create 40 threads, I mislead the OS to thinking I need more processing power and it may give me more than 1-2 cores? – AlexD Sep 11 '14 at 14:38
  • 1
    I think things get a bit woolly/complicated at this point and it's down to the OS and how it schedules processes - no guarantees on the behaviour. I would try to avoid creating too many threads as that can cause performance problems of their own, and perform your own benchmarks and realistic production loads. – James Baxter Sep 11 '14 at 14:52
1

One reason i could come up with for more threads than cores would be if some threads needed to interface with other parties... waiting for a response from a server.. querying something from the database. This will allow the thread to sleep until an answer is provided. this way other computations wouldn't have to wait. in the 4cores->4thread the thread would wait for input which possibly causes other code to have to wait too

Enermis
  • 679
  • 1
  • 5
  • 16
1

Adding threads to your application is not strictly about performance gains. Some times you want or need to perform more than one task at the same time because that is the most logical way to architect your program.

As an example, perhaps you are writing a game engine, if you take a multi-threaded approach, you may have one thread for physics, one thread for graphics, one thread for networking, one thread for user input, one thread for resource loading from disk etc.

Also James Baxters point is very true as well. Some times threads are waiting on a resource and can not execute further until they access said resource. With only the same number of threads as cores, one core would be going to waste.

Stephen
  • 4,041
  • 22
  • 39