4

In a cross platform (Linux and windows) real-time application, I need the fastest way to share data between a C++ process and a python application that I both manage. I currently use sockets but it's too slow when using high-bandwith data (4K images at 30 fps).

I would ultimately want to use the multiprocessing shared memory but my first tries suggest it does not work. I create the shared memory in C++ using Boost.Interprocess and try to read it in python like this:

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>

int main(int argc, char* argv[])
{
    using namespace boost::interprocess;

    //Remove shared memory on construction and destruction
    struct shm_remove
    {
        shm_remove() { shared_memory_object::remove("myshm"); }
        ~shm_remove() { shared_memory_object::remove("myshm"); }
    } remover;

    //Create a shared memory object.
    shared_memory_object shm(create_only, "myshm", read_write);

    //Set size
    shm.truncate(1000);

    //Map the whole shared memory in this process
    mapped_region region(shm, read_write);

    //Write all the memory to 1
    std::memset(region.get_address(), 1, region.get_size());

    std::system("pause");
}

And my python code:

from multiprocessing import shared_memory

if __name__ == "__main__":
    shm_a = shared_memory.SharedMemory(name="myshm", create=False)
    buffer = shm_a.buf
    print(buffer[0])

I get a system error FileNotFoundError: [WinError 2] : File not found. So I guess it only works internally in Python multiprocessing, right ? Python seems not to find the shared memory created on C++ side.

Another possibility would be to use mmap but I'm afraid that's not as fast as "pure" shared memory (without using the filesystem). As stated by the Boost.interprocess documentation:

However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory

I don't know to what extent it is slower however. I just would prefer the fastest solution as this is the bottleneck of my application for now.

poukill
  • 540
  • 8
  • 18
  • What is “pure” shared memory? – Jeremy Friesner Sep 07 '21 at 16:37
  • Without using the filesystem that adds overhead. I will try to improve my post. – poukill Sep 07 '21 at 17:40
  • 1
    You seem to be confusing "The name of my shared memory is in a filesystem" with "The content of my shared memory is on a disk". They are separate. Go ahead and put the shm file in a directory where the other program can find it. – Ben Voigt Sep 07 '21 at 20:00
  • This is on Windows? You should probably tag your question with the platform, since shared memory performance may well have platform-specific details. – Useless Sep 07 '21 at 20:01
  • Have a look here... https://stackoverflow.com/a/60976666/2836621 – Mark Setchell Sep 07 '21 at 21:53
  • 1
    What do you plan to DO with this data in Python? You're talking about one gigabyte per second. Unless you're immediately shuffling it off to more C-based code (like numpy), you aren't going to be able to keep up. – Tim Roberts Sep 08 '21 at 01:01
  • Agree with @TimRoberts. Don't implement shm so fast. Profile your python program to check if it could handle that data in the time interval you want. I guess.. not. Unless you accept any delay between source and python. – Louis Go Sep 08 '21 at 01:11
  • This is on Linux or Windows. It is a cross platform application. I updated my post accordingly. – poukill Sep 08 '21 at 02:13
  • Maybe indeed I won't be able to handle 4K 30 FPS, as you said. But I use numpy extensively and I would like to improve my situation. Don't want to go into too much details but that's a complete rewrite of previous similar application that was using shm and the performance was way better ( factor 6 in CPU usage, that's HUGE). The sockets read and write TCP calls are really costly when dealing with big data, I was not aware of that. With small data it's fast and synchronisation is really good though. I'm going for hybrid approach. – poukill Sep 08 '21 at 02:37
  • @MarkSetchell I have seen this post before. redis looks good, but does not seem to be that fast in my case. Thanks for pointing that out anyway, appreciated. – poukill Sep 08 '21 at 09:56
  • An example of shared memory communication using memory mapping : https://stackoverflow.com/questions/69794817/how-to-share-cvmat-for-processing-between-cpp-and-python-using-shared-memory/69806149#69806149 – KRG Nov 02 '21 at 07:08

3 Answers3

3

So I spent the last days implementing shared memory using mmap, and the results are quite good in my opinion. Here are the benchmarks results comparing my two implementations: pure TCP and mix of TCP and shared memory.

Protocol:

Benchmark consists of moving data from C++ to Python world (using python's numpy.nparray), then data sent back to C++ process. No further processing is involved, only serialization, deserialization and inter-process communication (IPC).

Case A:

Communication is done with TCP {header + data}.

Case B:

  • One C++ process implementing TCP communication using Boost.Asio and shared memory (mmap) using Boost.Interprocess
  • One Python3 process using standard TCP sockets and mmap

Communication is hybrid : synchronization is done through sockets (only header is passed) and data is moved through shared memory. I think this design is great because I have suffered in the past from problem of synchronization using condition variable in shared memory, and TCP is easy to use in both C++ and Python environments.

Results:

Small data at high frequency

200 MBytes/s total: 10 MByte sample at 20 samples per second

Case Global CPU consumption C++ part python part
A 17.5 % 10% 7.5%
B 6% 1% 5%

Big data at low frequency

200 MBytes/s total: 0.2 MByte sample at 1000 samples per second

Case Global CPU consumption C++ part python part
A 13.5 % 6.7% 6.8%
B 11% 5.5% 5.5%

Max bandwidth

  • A : 250 MBytes / second
  • B : 600 MBytes / second

Conclusion:

In my application, using mmap has a huge impact on big data at average frequency (almost 300 % performance gain). When using very high frequencies and small data, the benefit of shared memory is still there but not that impressive (only 20% improvement). Maximum throughput is more than 2 times bigger.

Using mmap is a good upgrade for me. I just wanted to share my results here.

poukill
  • 540
  • 8
  • 18
  • Would https://docs.python.org/3/library/multiprocessing.shared_memory.html be better for this, or is mmap better? – kevinlinxc Sep 24 '22 at 22:36
  • 1
    multiprocessing.shared_memory does not work for sharing data between C++ and python. So I use mmap and I'm still using it one year after the creation of this thread, works beautifully ! :) – poukill Sep 30 '22 at 18:50
2

An example of communication between C++ and python, using shared memory and memory mapping can be found in https://stackoverflow.com/a/69806149/2625176 .

KRG
  • 655
  • 7
  • 18
1

For future viewers, I fixed this error by using windows_shared_memory instead of shared_memory_object.

CaptXan
  • 49
  • 4