Fast communication between C++ and python using shared memory

Question

In a cross platform (Linux and windows) real-time application, I need the fastest way to share data between a C++ process and a python application that I both manage. I currently use sockets but it's too slow when using high-bandwith data (4K images at 30 fps).

I would ultimately want to use the multiprocessing shared memory but my first tries suggest it does not work. I create the shared memory in C++ using Boost.Interprocess and try to read it in python like this:

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>

int main(int argc, char* argv[])
{
    using namespace boost::interprocess;

    //Remove shared memory on construction and destruction
    struct shm_remove
    {
        shm_remove() { shared_memory_object::remove("myshm"); }
        ~shm_remove() { shared_memory_object::remove("myshm"); }
    } remover;

    //Create a shared memory object.
    shared_memory_object shm(create_only, "myshm", read_write);

    //Set size
    shm.truncate(1000);

    //Map the whole shared memory in this process
    mapped_region region(shm, read_write);

    //Write all the memory to 1
    std::memset(region.get_address(), 1, region.get_size());

    std::system("pause");
}

And my python code:

from multiprocessing import shared_memory

if __name__ == "__main__":
    shm_a = shared_memory.SharedMemory(name="myshm", create=False)
    buffer = shm_a.buf
    print(buffer[0])

I get a system error FileNotFoundError: [WinError 2] : File not found. So I guess it only works internally in Python multiprocessing, right ? Python seems not to find the shared memory created on C++ side.

Another possibility would be to use mmap but I'm afraid that's not as fast as "pure" shared memory (without using the filesystem). As stated by the Boost.interprocess documentation:

However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory

I don't know to what extent it is slower however. I just would prefer the fastest solution as this is the bottleneck of my application for now.

Without using the filesystem that adds overhead. I will try to improve my post. — poukill, Sep 07 '21 at 17:40
You seem to be confusing "The name of my shared memory is in a filesystem" with "The content of my shared memory is on a disk". They are separate. Go ahead and put the shm file in a directory where the other program can find it. — Ben Voigt, Sep 07 '21 at 20:00
This is on Windows? You should probably tag your question with the platform, since shared memory performance may well have platform-specific details. — Useless, Sep 07 '21 at 20:01
Have a look here... https://stackoverflow.com/a/60976666/2836621 — Mark Setchell, Sep 07 '21 at 21:53
What do you plan to DO with this data in Python? You're talking about one gigabyte per second. Unless you're immediately shuffling it off to more C-based code (like numpy), you aren't going to be able to keep up. — Tim Roberts, Sep 08 '21 at 01:01
Agree with @TimRoberts. Don't implement shm so fast. Profile your python program to check if it could handle that data in the time interval you want. I guess.. not. Unless you accept any delay between source and python. — Louis Go, Sep 08 '21 at 01:11
This is on Linux or Windows. It is a cross platform application. I updated my post accordingly. — poukill, Sep 08 '21 at 02:13
Maybe indeed I won't be able to handle 4K 30 FPS, as you said. But I use numpy extensively and I would like to improve my situation. Don't want to go into too much details but that's a complete rewrite of previous similar application that was using shm and the performance was way better ( factor 6 in CPU usage, that's HUGE). The sockets read and write TCP calls are really costly when dealing with big data, I was not aware of that. With small data it's fast and synchronisation is really good though. I'm going for hybrid approach. — poukill, Sep 08 '21 at 02:37
@MarkSetchell I have seen this post before. redis looks good, but does not seem to be that fast in my case. Thanks for pointing that out anyway, appreciated. — poukill, Sep 08 '21 at 09:56
An example of shared memory communication using memory mapping : https://stackoverflow.com/questions/69794817/how-to-share-cvmat-for-processing-between-cpp-and-python-using-shared-memory/69806149#69806149 — KRG, Nov 02 '21 at 07:08

score 3 · Accepted Answer · answered Sep 15 '21 at 19:53

So I spent the last days implementing shared memory using mmap, and the results are quite good in my opinion. Here are the benchmarks results comparing my two implementations: pure TCP and mix of TCP and shared memory.

Protocol:

Benchmark consists of moving data from C++ to Python world (using python's numpy.nparray), then data sent back to C++ process. No further processing is involved, only serialization, deserialization and inter-process communication (IPC).

Case A:

One C++ process implementing TCP communication using Boost.Asio
One Python3 process using standard python TCP sockets

Communication is done with TCP {header + data}.

Case B:

One C++ process implementing TCP communication using Boost.Asio and shared memory (mmap) using Boost.Interprocess
One Python3 process using standard TCP sockets and mmap

Communication is hybrid : synchronization is done through sockets (only header is passed) and data is moved through shared memory. I think this design is great because I have suffered in the past from problem of synchronization using condition variable in shared memory, and TCP is easy to use in both C++ and Python environments.

Results:

Small data at high frequency

200 MBytes/s total: 10 MByte sample at 20 samples per second

Case	Global CPU consumption	C++ part	python part
A	17.5 %	10%	7.5%
B	6%	1%	5%

Big data at low frequency

200 MBytes/s total: 0.2 MByte sample at 1000 samples per second

Case	Global CPU consumption	C++ part	python part
A	13.5 %	6.7%	6.8%
B	11%	5.5%	5.5%

Max bandwidth

A : 250 MBytes / second
B : 600 MBytes / second

Conclusion:

In my application, using mmap has a huge impact on big data at average frequency (almost 300 % performance gain). When using very high frequencies and small data, the benefit of shared memory is still there but not that impressive (only 20% improvement). Maximum throughput is more than 2 times bigger.

Using mmap is a good upgrade for me. I just wanted to share my results here.

Would https://docs.python.org/3/library/multiprocessing.shared_memory.html be better for this, or is mmap better? — kevinlinxc, Sep 24 '22 at 22:36
multiprocessing.shared_memory does not work for sharing data between C++ and python. So I use mmap and I'm still using it one year after the creation of this thread, works beautifully ! :) — poukill, Sep 30 '22 at 18:50

score 2 · Answer 2 · answered Nov 02 '21 at 07:11

2

An example of communication between C++ and python, using shared memory and memory mapping can be found in https://stackoverflow.com/a/69806149/2625176 .

answered Nov 02 '21 at 07:11

KRG

655
7
18

score 1 · Answer 3 · answered Jan 02 '23 at 01:23

1

For future viewers, I fixed this error by using windows_shared_memory instead of shared_memory_object.

answered Jan 02 '23 at 01:23

CaptXan

49
4