Fastest technique to pass messages between processes on Linux?

Question

What is the fastest technology to send messages between C++ application processes, on Linux? I am vaguely aware that the following techniques are on the table:

TCP
UDP
Sockets
Pipes
Named pipes
Memory-mapped files

are there any more ways and what is the fastest?

@paddy basically I will be looking to shave off every nano/microsecond that I can. — user997112, Jan 08 '13 at 22:24

Mats Petersson · Answer 1 · 2013-01-09T00:12:46.263

Whilst all the above answers are very good, I think we'd have to discuss what is "fastest" [and does it have to be "fastest" or just "fast enough for "?]

For LARGE messages, there is no doubt that shared memory is a very good technique, and very useful in many ways.

However, if the messages are small, there are drawbacks of having to come up with your own message-passing protocol and method of informing the other process that there is a message.

Pipes and named pipes are much easier to use in this case - they behave pretty much like a file, you just write data at the sending side, and read the data at the receiving side. If the sender writes something, the receiver side automatically wakes up. If the pipe is full, the sending side gets blocked. If there is no more data from the sender, the receiving side is automatically blocked. Which means that this can be implemented in fairly few lines of code with a pretty good guarantee that it will work at all times, every time.

Shared memory on the other hand relies on some other mechanism to inform the other thread that "you have a packet of data to process". Yes, it's very fast if you have LARGE packets of data to copy - but I would be surprised if there is a huge difference to a pipe, really. Main benefit would be that the other side doesn't have to copy the data out of the shared memory - but it also relies on there being enough memory to hold all "in flight" messages, or the sender having the ability to hold back things.

I'm not saying "don't use shared memory", I'm just saying that there is no such thing as "one solution that solves all problems 'best'".

To clarify: I would start by implementing a simple method using a pipe or named pipe [depending on which suits the purposes], and measure the performance of that. If a significant time is spent actually copying the data, then I would consider using other methods.

Of course, another consideration should be "are we ever going to use two separate machines [or two virtual machines on the same system] to solve this problem. In which case, a network solution is a better choice - even if it's not THE fastest, I've run a local TCP stack on my machines at work for benchmark purposes and got some 20-30Gbit/s (2-3GB/s) with sustained traffic. A raw memcpy within the same process gets around 50-100GBit/s (5-10GB/s) (unless the block size is REALLY tiny and fits in the L1 cache). I haven't measured a standard pipe, but I expect that's somewhere roughly in the middle of those two numbers. [This is numbers that are about right for a number of different medium-sized fairly modern PC's - obviously, on a ARM, MIPS or other embedded style controller, expect a lower number for all of these methods]

My messages will be small in size. However, I would not want to block the sender if the receiver cannot copy. This is because imagine I am sending weather data of the same country- the most recent weather data message will override any remaining messages which are still currently being processed. I do however like the fact you say the receiver will be automatically notified! — user997112, Jan 08 '13 at 23:28
There are various ways you'd be able to do that. And it may be simpler to let the receiver look (briefly) at the message it read and say "Well, it's old, so I'll just throw this away" than to fix messaging system to sort things out. That assumes that your processing in the receiving side is substantial, and it's relatively easy to send the data. Another way to solve that is to have a two-way system, where the "receiver" says "I'm done, please send the next packet now!", and the sender simply keeping that "most up to date" at any given time. — Mats Petersson, Jan 08 '13 at 23:31
While I agree with all that, it would depend on how shared memory is used. E.g. one could implement double buffering: The sender continuously dumps data into block A, each time locking a lock and setting an 'avail flag'. The reader(s) could then wait on that lock, turn the buffers and reset that flag, so that they can safely use the most recent data (read only) without copying, while the writer continues to write into block B. Whether the writer should be blocked by another lock or not may be defined according to the type of data processing it does. — Sam, Jan 08 '13 at 23:43
I agree. I wanted to explain in my answer that there are several ways to solve the same problem, and it all depends on what you are actually trying to achieve which is best, rather than state outright that "one solution is best", because I don't believe that is right. Unless either the data is fairly large, or the processing is very trivial, the actual method to transfer the data is PROBABLY not the biggest stumbling block. — Mats Petersson, Jan 09 '13 at 00:05
Guess, we are in complete agreement, that the OP should show us some details. — Sam, Jan 09 '13 at 00:13
It is just a case of message being sent to receiver- receiver begins processing data, then once finished receiver begins processing next piece of data "in the queue". So I am implementing a queuing system, sender "sends" data to the receiver using whatever technique is recommended here. Has that helped provide more info? — user997112, Jan 09 '13 at 00:19
Yes, so I have following questions: How large are those pieces? Have all pieces to be processed completely and in order? — Sam, Jan 09 '13 at 00:27

score 18 · Accepted Answer · edited May 23 '17 at 12:02

18

I would suggest looking at this also: How to use shared memory with Linux in C.

Basically, I'd drop network protocols such as TCP and UDP when doing IPC on a single machine. These have packeting overhead and are bound to even more resources (e.g. ports, loopback interface).

edited May 23 '17 at 12:02

Community

1
1

answered Jan 08 '13 at 22:24

Sam

7,778
1
23
49

Linked document is awesome! Thank you – Davide Berra Dec 08 '16 at 09:22

DejanLekic · Answer 3 · 2015-11-06T14:56:02.007

NetOS Systems Research Group from Cambridge University, UK has done some (open-source) IPC benchmarks.

Source code is located at https://github.com/avsm/ipc-bench .

Project page: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/ .

Results: http://www.cl.cam.ac.uk/research/srg/netos/projects/ipc-bench/results.html

This research has been published using the results above: http://anil.recoil.org/papers/drafts/2012-usenix-ipc-draft1.pdf

score 3 · Answer 4 · answered Jan 08 '13 at 22:24

3

Check CMA and kdbus: https://lwn.net/Articles/466304/

I think the fastest stuff these days are based on AIO. http://www.kegel.com/c10k.html

answered Jan 08 '13 at 22:24

Alex

950
7
8

1

The AIO stuff is _not_ the fastest solution for communicating between processes on the same processor. Your second link isn't really anything I'd recommend. – James Kanze Jan 08 '13 at 23:16
@JamesKanze would you be able to elaborate on your points? With regard to c10k, i have often shared your view- but I have seen that URL quoted many times on SO?? – user997112 Jan 08 '13 at 23:25
3

@user997112 For anything on the same processor, shared memory beats the alternatives hands down. Between processors, the time differences between asynchronous IO and using separate threads are negligible, and the multithread model is significantly cleaner and easier to develop and maintain. With efficient threading, there's no case where I would chose async IO. – James Kanze Jan 08 '13 at 23:45
People have commented mostly on the size of the message being exchanged, and if you use one or two processors. But I believe that a relevant and important issue is the rate of events. If you are processing a very large of events per second (say hundreds of thousands) then AIO may give you an edge. – Alex Jan 09 '13 at 11:25
@JamesKanze "and the multithread model is significantly cleaner and easier to develop and maintain" -> I thought unpredictable pre-emption was a con of the threading model, so that it is easier to reason about non-blocking IO solutions.... – lucid_dreamer Feb 27 '19 at 17:38

oblitum · Answer 5 · 2013-01-09T08:51:50.823

As you tagged this question with C++, I'd recommend Boost.Interprocess:

Shared memory is the fastest interprocess communication mechanism. The operating system maps a memory segment in the address space of several processes, so that several processes can read and write in that memory segment without calling operating system functions. However, we need some kind of synchronization between processes that read and write shared memory.

Source

One caveat I've found is the portability limitations for synchronization primitives. Nor OS X, nor Windows have a native implementation for interprocess condition variables, for example, and so it emulates them with spin locks.

Now if you use a *nix which supports POSIX process shared primitives, there will be no problems.

Shared memory with synchronization is a good approach when considerable data is involved.

Sadly, Boost is bloated. – étale-cohomology Jul 26 '20 at 05:43 — étale-cohomology, Jul 26 '20 at 05:43

score 2 · Answer 6 · answered Jan 08 '13 at 22:24

2

Well, you could simply have a shared memory segment between your processes, using the linux shared memory aka SHM.

It's quite easy to use, look at the link for some examples.

answered Jan 08 '13 at 22:24

cmc

2,061
1
19
18

score 0 · Answer 7 · answered Jul 28 '13 at 12:51

0

posix message queues are pretty fast but they have some limitations

answered Jul 28 '13 at 12:51

arash kordi

2,470
1
22
24

Fastest technique to pass messages between processes on Linux?

7 Answers7

Linked