0

I went through a few questions such as POSIX Threads on a Multiprocessor System and Concurrency of posix threads in multiprocessor machine and Threads & Processes Vs MultiThreading & Multi-Core/MultiProcessor : How they are mapped?


Based on these and few other Wiki articles , I believe for a system having three basic works viz, Input , Processing and Output


  • For a CPU - bound processing number of CPU -intensive threads (No. of Application * Thread per application) should be apprx 1 to 1.5 times the number of cores of processor.

  • Input and Output threads must be sufficiently large, so as to remove any bottlenecks. For example for a communication system which is based on query/query-ack and response/response - ack model, time must not be wasted in I/O waiting states.

  • If there is a large requirement for dynamic memory, its better to go with greater number of processes than threads (to avoid memory sync ups).

    Are these arguments fairly consistent while determining number of threads to have in our application ? Do we need to look into any other paramters??

Community
  • 1
  • 1
Anerudhan Gopal
  • 379
  • 4
  • 13

1 Answers1

1

'1 to 1.5 times the number of cores' - this appears to be OS/langauge dependent. On Windows/C++, for example, with large numbers of CPU-intensive tasks, the optimum seems to be much more than twice the number of cores with the performance spread very small. If such environments, it seems you may as well just allocate 64 threads on a pool and not bother with the number of cores.

'query/query-ack and response/response - ack model, time must not be wasted in I/O waiting states' - this is unavoidable with such protocols with the high latency of most networks. The delay is enforced by the 'ping-pong' protocol & so there will, inevitably be an I/O wait. Async I/O just moves this wait into the kernel - it's still there!

'large requirement for dynamic memory, its better to go with greater number of processes than threads' - not really. 'large requirement for dynamic memory' usually means that large data buffers are going to be moved about. Large buffers can only be efficiently moved around by reference. This is very easy and quick between threads because of the shared memory space. With processes, you are stuck with awkward and slow inter-process comms.

'Determining number of threads to have in our application' - well, so difficult on several fronts. Given an unknown architecture, design. language and OS, the only advice I have is to try and make everything as flexible and configurable as you reasonably can. If you have a thread pool, make its size a run-time parameter you can tweak. If you have an object pool, try to design it so that you can change its depth. Have some default values that work on your test boxes and then, at installation or while running, you can make any specific changes and tweaks for a particular system.

The other thing with flexible/configurable designs is that you can, at test time, tweak away and fix many of the incorrect decisions, assumptions and guesstimates made by architects, designers, developers and, most of all, customers

Martin James
  • 24,453
  • 3
  • 36
  • 60
  • Thanks Martin.. But why does OS have a role in it ?? Is it because the final order of execution of instruction (scheduling, etc) depend on OS ? I guessed that the pipe lining and out of order execution were properties of hardware threads .. Please corret me if I am wrong .. – Anerudhan Gopal Apr 13 '12 at 10:39
  • The OS underlies the interface. Inevitably, peformance differences will surface. One of them is the scheduling algorithm. The final order of thread dispatching depends on the OS, yes. Pipe lining and out of order execution are hardware optimizations, yes. – Martin James Apr 13 '12 at 10:47
  • THanks once again.. Think I the best way forward is experimenting with an open mind.. :) – Anerudhan Gopal Apr 13 '12 at 10:53