"Multi-process" vs. "single-process multi-threading" for software modules communicating via messaging

Question

We need to build a software framework (or middleware) that will enable messaging between different software components (or modules) running on a single machine. This framework will provide such features:

Communication between modules are through 'messaging'.
Each module will have its own message queue and message handler thread that will synchronously handle each incoming message.

With the above requirements, which of the following approach is the correct one (with its reasoning)?:

Implementing modules as processes, and messaging through shared memory
Implementing modules as threads in a single process, and messaging by pushing message objects to the destination module's message queue.

Of source, there are some apparent cons & pros:

In Option-2, if one module causes segmentation fault, the process (thus the whole application) will crash. And one module can access/mutate another module's memory directly, which can lead to difficult-to-debug runtime errors.
But with Option-1, you need to take care of the states where a module you need to communicate has just crashed. If there are N modules in the software, there can be 2^N many alive/crashed states of the system that affects the algorithms running on the modules.
Again in Option-1, sender cannot assume that the receiver has received the message, because it might have crashed at that moment. (But the system can alert all the modules that a particular module has crashed; that way, sender can conclude that the receiver will not be able to handle the message, even though it has successfully received it)

I am in favor of Option-2, but I am not sure whether my arguments are solid enough or not. What are your opinions?

EDIT: Upon requests for clarification, here are more specification details:

This is an embedded application that is going to run on Linux OS.
Unfortunately, I cannot tell you about the project itself, but I can say that there are multiple components of the project, each component will be developed by its own team (of 3-4 people), and it is decided that the communication between these components/modules are through some kind of messaging framework.
C/C++ will be used as programming language.
What the 'Module Interface API' will automatically provide to the developers of a module are: (1) An message/event handler thread loop, (2) a synchronous message queue, (3) a function pointer member variable where you can set your message handler function.

So, you are trying to design modules structure according to communication level requirements. Is there anything else that these modules should do, except communication? — Alex F, Oct 03 '13 at 10:41
Yes, module's structure will be shaped according to what is provided by the communication framework. These modules will do tasks from network socket communication to running machine learning algorithms. It is allowed for one module to run multiple threads for its own needs. But inter-module communication will be done via messaging in a synchronized manner. Let me know if you need more clarification.. — Benji Mizrahi, Oct 03 '13 at 11:18
Primarily concern should be the financial loss to the company should the segfault / access violation or whatever occur causing a process to crash. Business concerns is the reason you write code. On a "pure" programming basis you will assume you just find any bugs and fix them. I would use shared memory on multi-process only for any "large" amount of data that will need to be loaded and shared between your processes and read directly and ideally should be read-only or managed via a database. — CashCow, Aug 07 '14 at 10:42

score 7 · Answer 1 · answered Oct 04 '13 at 10:21

Here is what I could come up with:

Multi-process(1) vs. Single-process, multi-threaded(2):

Impact of segmentation faults: In (2), if one module causes segmentation fault, the whole application crashes. In (1), modules have different memory regions and thus only the module that cause segmentation fault will crash.
Message delivery guarantee: In (2), you can assume that message delivery is guaranteed. In (1) the receiving module may crash before the receival or during handling of the message.
Sharing memory between modules: In (2), the whole memory is shared by all modules, so you can directly send message objects. In (1), you need to use 'Shared Memory' between modules.
Messaging implementation: In (2), you can send message objects between modules, in (1) you need to use either of network socket, unix socket, pipes, or message objects stored in a Shared Memory. For the sake of efficiency, storing message objects in a Shared Memory seems to be the best choice.
Pointer usage between modules: In (2), you can use pointers in your message objects. The ownership of heap objects (accessed by pointers in the messages) can be transferred to the receiving module. In (1), you need to manually manage the memory (with custom malloc/free functions) in the 'Shared Memory' region.
Module management: In (2), you are managing just one process. In (1), you need to manage a pool of processes each representing one module.

score 3 · Answer 2 · answered Oct 03 '13 at 22:32

Sounds like you're implementing Communicating Sequential Processes. Excellent!

Tackling threads vs processes first, I would stick to threads; the context switch times are faster (especially on Windows where process context switches are quite slow).

Second, shared memory vs a message queue; if you're doing full synchronous message passing it'll make no difference to performance. The shared memory approach involves a shared buffer that gets copied to by the sender and copied from by the reader. That's the same amount of work as is required for a message queue. So for simplicity's sake I would stick with the message queue.

in fact you might like to consider using a pipe instead of a message queue. You have to write code to make the pipe synchronous (they're normally asynchronous, which would be Actor Model; message queues can often be set to zero length which does what you want for it to be synchronous and properly CSP), but then you could just as easily use a socket instead. Your program can then become multi-machine distributed should the need arise, but you've not had to change the architecture at all. Also named pipes between processes is an equivalent option, so on platforms where process context switch times are good (e.g. linux) the whole thread vs process question goes away. So working a bit harder to use a pipe gives you very significant scalability options.

Regarding crashing; if you go the multiprocess route and you want to be able to gracefully handle the failure of a process you're going to have to do a bit of work. Essentially you will need a thread at each end of the messaging channel simply to monitor the responsiveness of the other end (perhaps by bouncing a keep-awake message back and forth between themselves). These threads need to feed status info into their corresponding main thread to tell it when the other end has failed to send a keep-awake on schedule. The main thread can then act accordingly. When I did this I had the monitor thread automatically reconnect as and when it could (e.g. the remote process has come back to life), and tell the main thread that too. This means that bits of my system can come and go and the rest of it just copes nicely.

Finally, your actual application processes will end up as a loop, with something like select() at the top to wait for message inputs from all the different channels (and monitor threads) that it is expecting to hear from.

By the way, this sort of thing is frustratingly hard to implement in Windows. There's just no proper equivalent of select() anywhere in any Microsoft language. There is a select() for sockets, but you can't use it on pipes, etc. like you can in Unix. The Cygwin guys had real problems implementing their version of select(). I think they ended up with a polling thread per file descriptor; massively inefficient.

Good luck!

You say "The shared memory approach involves a shared buffer that gets copied to by the sender and copied from by the reader" but the contents of a message can be a pointer to a another memory location on the shared memory. — Benji Mizrahi, Oct 04 '13 at 06:43
@BenjiMizrahi, sorry for the very slow reply - 6 years. Yes you can indeed do that, but then you have to the issue of ownership control, and who is allowed to modify the shared memory. By sending a whole copy - either through a pipe or message queue, you don't need to worry about this. If the sender modifies their original data and the recipient needs to know about it, another copy (or deltas) must be sent. And yes, that can be inefficient. Since I wrote this answer ZeroMQ has come on a looong way, and that's what I'd use today if at all possible, — bazza, Dec 20 '19 at 22:08

score 0 · Answer 3 · answered Oct 04 '13 at 10:57

0

Your question lacks a description of how the "modules" are implemented and what do they do, and possibly a description of the environment in which you are planning to implement all of this.

For example:

If the modules themselves have some requirements which makes them hard to implement as threads (e.g. they use non-thread-safe 3rd party libraries, have global variables, etc.), your message delivery system will also not be implementable with threads.
If you are using an environment such as Python which does not handle thread parallelism very well (because of its global interpreter lock), and running on Linux, you will not gain any performance benefits with threads over processes.

There are more things to consider. If you are just passing data between modules, who says your system needs to use either multiple threads or multiple processes? There are other architectures which do the same thing without either of them, such as event-driven with callbacks (a message receiver can register a callback with your system, which is invoked when a message generator generates a message). This approach will be absolutely the fastest in any case where parallelism isn't important and where receiving code can be invoked in the execution context of the caller.

tl;dr: you have only scratched the surface with your question :)

answered Oct 04 '13 at 10:57

Ivan Voras

1,895
1
13
20

I did some clarification on my question after your feedback. What the module interface API will provide is exactly what you mention: event-driven messaging with callbacks. My question is using it whether in a single-process multi-threaded (each thread is the main loop of this event-handling mechanism) or in a multi-process (each process's main thread is responsible of event-handling) manner? – Benji Mizrahi Oct 04 '13 at 12:07
If you are communicating between processes, have you considered using something other than shared memory, e.g. pipes and sockets? This way, you would get simpler (still not trivial) indication of whether the receiver crashed: the sender will get an error if it attempts to send again. Since you don't trust your modules (you are concerned about threads corrupting each other's memory - your pro&con #3), multi-process approach is the only which would solve that. – Ivan Voras Oct 05 '13 at 08:59
If I use socket or pipes, I am gonna need to serialize/deserialize message objects. Using shared memory, I can store the message objects in the shared memory and pass a pointer to the message objects (these pointers can be send through pipes or sockets) – Benji Mizrahi Oct 05 '13 at 09:32
If you have it in shared memory, it means that either a) it's a contiguous chunk of data, a structure with no pointers in it or b) you are relying that the shared memory chunk will be at the exactly same address in all processes (which may be true if you are forking the processes off a master process after setting up shared memory). If the a) case is true, then you don't need any special serialization, you could simply send the raw data. It's just an idea; I think you've covered all grounds: if you need the performance, use threads, if you need isolation (e.g. for security), use processes. – Ivan Voras Oct 05 '13 at 10:41

"Multi-process" vs. "single-process multi-threading" for software modules communicating via messaging

3 Answers3