Lauching a thread and terminating it require many hundreds of machine cycles. But that's only a beginning. Context switches between threads, that are bound to happen if the threads are doing anything useful, will repeatedly consume even more many hundreds of machine cycles. The execution context of all these threads will consume many a byte of memory, which in turn will mess up many a line of cache, thus hindering the CPU efforts for yet another great deal of many hundreds of machine cycles.
As a matter of fact, doing anything with multitasking is a great consumer of many hundreds of machine cycles. Multitasking only becomes profitable in terms of CPU power usage when you manage to get enough processors working on lumps of data that are conceptually independent (so parallel processing won't threaten their integrity) and big enough to show a net gain compared with a monoprocessor version.
In all other cases, multitasking is inherently inefficient in all domains but one: reactivity. A task can react very quickly and precisely to an external event, that ultimately comes from some external H/W component (be it the internal clock for timers or your WiFi/Ethernet controller for network traffic).
This ability to wait for external events without wasting CPU is what increases the overall CPU efficiency. And that's it.
In terms of other performance parameters (memory consumption, time wasted inside kernel calls, etc), launching a new thread is always a net loss.
In a nutshell, the art of multitasking programming boils down to:
- identifying the external I/O flows you will have to handle
- taking reactivity requirements into account (remembering that more reactive = less CPU/memory efficient 99% of the time)
- setting up handlers for the required events with a reasonable efficiency/ease of maintenance compromise.
Multiprocessor architectures are adding a new level of complexity, since any program can now be seen as a process having a number of external CPUs at hand, that could be used as additional power sources. But your problem does not seem to have anything to do with that.
A measure of multitasking efficiency will ultimately depend on the number of external events a given program is expected to cope with simultaneously and within a given set of reactivity limits.
At last I come to your particular question.
To react to external events, launching a task each time a new twig or bit of dead insect has to be moved around the anthill is a very coarse and inefficient approach.
You have many powerful synchronization tools at your disposal, that will allow you to react to a bunch of asynchronous events from within a single task context with (near) optimal efficiency at (virtually) no cost.
Typically, blocking waits on multiple inputs, like for instance the unix-flavoured select()
or Microsoft's WaitForMultipleEvents()
counterpart.
Using these will give you a performance boost incomparably greater than the few dozen CPU cycles you could squeeze out of this task-result-gathering-optimization project of yours.
So my answer is: don't bother with optimizing thread setup at all. It's a non-issue.
Your time would be better spent rethinking your architecture so that a handful of well thought out threads could replace the hordes of useless CPU and memory hogs your current design would spawn.