0

I have a particular problem which I cannot seem to reproduce in a minimal working example. I have to deal with a large framework of legacy code and modifying all of that out of my scope. To deal with it I have to apply some particular patterns.

Overview of the codebase

I have a managed C# application (.NET 5.0). In this appliation I need to run some C++ code. For this, there is a CLI-wrapper project. This wrapper contains most of the legacy framework which is out of my control and is why I can only transfer strings to my C++ class (more on this later). Based on config, this legacy framework uses the wrapper to instantiate C++ classes and calls methods on them, processes the results and finally, destroys all the C++ classes afterwards. This CLI-wrapper allows me ONLY to pass strings as parameters to the C++ classes it creates.

All of my libraries are dynamically linked (using DLL's). The C# is a project which references the C++/CLI wrapper which in turn referenced the C++ project with my C++-class. This project references the external LargeLibrary (more on this later).

The root of the problem

The C++ code is called repeatedly, every few seconds. It should respond fast. My C++ code needs to load some large file from disk (about 400 MB) and process it which takes quite some time. Since the C++ classes are recreated each time, loading the file each time consumes so much time which is unacceptable. As this data is essentially constant, I try to load it once during initialisation of the program. Then I pass a pointer to my C++ class which then can use the object. The object then remains in memory when the C++ class is destroyed so it can be used again later.

To complicate things, I need quite a large library to read and process my file (I reference this library here as LargeLibrary). If I make the CLI-wrapper dependent on this, it won't compile. I can imagine this is because of the CLI stuff. Therefore, I use a void pointer, so the wrapper does not have to be aware of the actual type of behind the pointer. The actual object is created using a function inside my C++-class (so the correct destructor is linked to the shared pointer). This all compiles fine.

My solution

I made a small extension to the CLI-wrapper to create the object which read my file from disk and keeps the information in memory. This object is created using the method CreateInformationObject(). ptr_native is a smart pointer for using native objects in managed code. It's type is: CAutoNativePtr<std::shared_ptr<void>> ptr_native. Creating my object inside the wrapper looks like:

// Create a shared_ptr on dynamic memory (i.e. heap).
std::shared_ptr<void>* objectPointer = new std::shared_ptr<void>();

// Load the module and store a shared pointer pointing to it in the dynamic memory.
*objectPointer = CppConsumerStuff::CppConsumer::CreateInformationObject(value);

// Load the module and store a shared pointer pointing to it in the dynamic memory.
ptr_native.Attach(objectPointer);

The CreateInformationObject() method inside my C++ class (the CppConsumerStuff::CppConsumer) is:

std::shared_ptr<void> CppConsumer::CreateInformationObject(std::string pathToFile)
{
    std::shared_ptr<LargeLibrary::ActualObjectType> objectPtr = std::make_shared<LargeLibrary::ActualObjectType>();
    
    *objectPtr = LargeLibrary::FileLoader::load(pathToFile)
    return objectPtr;
}

Then, because of the legacy framework, I tried this longshot: convert the pointer address to string, pass it via the framework to my C++ class and convert it back to a pointer to the actual type of the object.

This goes like (in my CLI-wrapper extension):

//Cast void pointer to string.
String^ CliStorage::GetPointerString()
{
    std::stringstream ss;
    ss << (*ptr_native).get();  // Pointer to hex string.
    std::string ptr_string = ss.str();
    return StringToManaged(ptr_string);
}

Finally, (in my C++ class), I convert this pointer-string back to a pointer to the actual object as:

void DoWorkOnLargeObject(std::string ptr_string)
{
    // Cast pointer to usable type
    uint64_t raw_ptr = 0; // Define int size depending on system architecture.
    std::stringstream ss;
    ss << std::hex << ptr_string;
    ss >> raw_ptr; //Hex string to int.
    cppObjectPtr = reinterpret_cast<void*>(raw_ptr);
    LargeLibrary::ActualObjectType* cppObjectPtrCasted = static_cast<LargeLibrary::ActualObjectType*>(cppObjectPtr);
    
    // Use the object.
    cppObjectPtrCasted->GetDataStuff();
    // Rest of code doing work...
}

My results

I build all of this in Visual Studio 2019. When I create a Debug build, all works :). However, when I create a Release build, it does not work and throws the following Exception: ``

Minimal working example

I tried to create a minimal working example. Both with and without the large external library. However, in my minimum working Examples it always works, no matter the build type (debug / release).

My question

So my question is: Do my minimum working examples work by accident and am I relying on undefined behavior? Or should this concept (no matter how ugly it is) indeed work? If it is undefined behavior, please explain, I want to learn. If it should work, the problem resides in the legacy framework and I will make inquiries about this.

I know these are very ugly patterns, but I try to get something working with the means I have within my scope.

Thank you


EDIT, I added CreateInformationObject() method code to my question. I think my hazard may be inside here. Maybe I do some illegal pointer stuff which results in undefined behavior?

Stefan
  • 919
  • 2
  • 13
  • 24
  • `std::shared_ptr*` is a major antipattern. `std::shared_ptr` should only be used as an automatic variable or member variable never as a pointer and should never be heap allocated. That destroys the entire value of using a smart pointer. Ideally when you work with strings in C++ you wouldn't convert them from wide strings, you're losing information when you do. Use `std::wstring` instead. Also yes you're relying on a ton of undefined behavior. This code is wildly unsafe. – Mgetz Dec 01 '21 at 18:53
  • Note: `CAutoNativePtr` isn't necessary if you just have `std::shared_ptr` as a data member of the C++/CLI ref class that should work fine. It will make the C++/CLI type Disposable by default because it will have a destructor to call, but this is fine. You shouldn't need to manually implement a destructor if you're using the smart pointer correctly. I do have concerns that you probably don't need nor want a `shared_ptr` but there isn't enough information to be sure. – Mgetz Dec 01 '21 at 19:12
  • Do I understand this right: Your program writes some data at some point of the heap and saves the position somewhere. Then it exits. Next, a new instance of this program (i.e. not the instance that created the allocated the pointer on the heap) reads the position from somewhere and tries to access this. Ist this correct? The I see the problem that a progam is not allowed to read from arbitrary memory position to protect the data of other programs (https://en.wikipedia.org/wiki/Memory_protection). Hence this cannot work in my opinion. – n314159 Dec 01 '21 at 20:51
  • @Mgetz , I have added the definition of the method `CreateInformationObject` to my question. As you can see, I create the `shared_ptr` such that it knows which destructor to call when the class holding the shared pointer goes out of scope. Do you still think something is wrong? – Stefan Dec 02 '21 at 17:04
  • @n314159 , Thank you for your answer. All of this happens in the same program instance. So there is no interproces memory sharing. All of my libraries are dynamically linked (using DLL's). The C# is a project which references the C++/CLI wrapper which in turn referenced the C++ project with my C++-class. This project references the external 'LargeLibrary'. The wrapper contains the legacy framework which is out of my control and is why I can only transfer strings to my C++ class. – Stefan Dec 02 '21 at 17:51
  • @Mgetz Thanks by the way :). I know this code is widely unsafe :|. If you can point to the spots where I rely on undefined behavior, that would be of great help! I agree with you that I should not create a `std::shared_ptr` on the heap. I don't think I create the shared_ptr on the heap. I only create it on the stack and pass it around. I want to store the shared pointer in my `CAutoNativePtr` so if my mangaged class (responsible for the life time of the `ActualObjectType`) dies, the object inside the shared_ptr gets destroyed. Perhaps you want to look at my `CreateInformationObject` method? – Stefan Dec 02 '21 at 18:06
  • @Stefan yes I do want to see `CreateInformationObject`. But you should be able to hold onto the native smart pointer without the double nesting. The destructor on the native type will automatically flag the CLR object as disposable. All you'll need to do is dispose of it properly from C#. But the `CAutoNativePtr` doesn't make sense and honestly didn't when it was written other than smart pointers weren't generally available (C++11 introduced the first ones that didn't cause brain damage). But today you should be able to use `std::shared_ptr` directly without issue. – Mgetz Dec 02 '21 at 19:17
  • @Stefan you should also never need to return `std::shared_ptr` that... is very bad as that `shared_ptr` can't actually correctly delete what it's holding. – Mgetz Dec 02 '21 at 19:22
  • @Mgetz, I researched this a little and I concluded that the `std::shared_ptr` will delete the actual object because on creation of the pointer, the deleter is stored inside it simply using a function pointer. E.g. see this: https://stackoverflow.com/questions/5913396/why-do-stdshared-ptrvoid-work – Stefan Dec 02 '21 at 20:21
  • @Mgetz, I was unable to use the `std::shared_ptr` in the managed part of the program. Perhaps there is some pattern I'm not aware of? That is why I used the `CAutoNativePtr`, this class is included in the legacy framework and I though I could just use it. The `CAutoNativePtr` stores my shared_ptr. When the `CAutoNativePtr` goes out of scope, the .NET garbage collector will destroy it, and when it is destroyed it will destroy the object it is holding, which is my shared_ptr, which in turn will delete the object (400 MB) from memory. At least, this is the plan. – Stefan Dec 02 '21 at 20:28
  • @Mgetz , But the problem I'm having occurs not at destroying the object. The problem occurs when my C++ class (`CppConsumer`) tries to access the object via the provided pointer. It works when I run a Debug build, but it fails when I do a Release build. – Stefan Dec 02 '21 at 20:35
  • @someone , why the downvote ? Without feedback I cannot improve the question... – Stefan Dec 03 '21 at 08:36
  • 1
    I think the approach of just saving the pointer to the heap and then use it later via reinterpret_cast will lead to UB. I don't have a specific part of the standard in mind but I think that you are in an area where you would have to show that it is valid and not the other way round. Further, the approach with shared memory is valid in my opinion even with only one process. That being said *maybe* [std::launder](https://en.cppreference.com/w/cpp/utility/launder) could help you. See https://stackoverflow.com/questions/39382501/what-is-the-purpose-of-stdlaunder#39382728 for an explanation. – n314159 Dec 03 '21 at 22:56
  • @n314159 , Thanks, I will look at it. It may take some time because I also want to understand. When I got it working, I will upvote your post and (in case useful), I edit you post to an out of the box working example. – Stefan Dec 08 '21 at 11:39
  • Hi Guys, this question is rather difficult to answer in a way I can actually learn what is wrong. I really would like this to improve my knowledge of C++. Therefore, I created another post, which is much more suitable for this, perhaps you guys want to take a look, as you also know a little bit about the background because of this post: https://stackoverflow.com/questions/70274047/do-any-of-these-pointer-operation-trigger-undefined-behavior?noredirect=1#comment124225328_70274047 – Stefan Dec 08 '21 at 11:40

1 Answers1

1

I am not an expert on this so take my advise with a grain of salt. In my opinion directly sharing the memory address between the processes will in general fail due to memory protection (which forbids programs to just access memory that was not allocated for them).

You could used shared memory. This is memory shared between processes. Normally one would use this to share memory between concurrent processes but this is in no way necessary (and not having competing accesses is actually beneficial). Wikipedia lists boost and Qt as examples for libraries implementing cross-platform support for shared memory.

Looking into the boost documentation for sharing memory, it says "As shared memory has kernel or filesystem persistence, the user must explicitly destroy it.", which is exactly what you want, since it should persist between calls of the same program. Note that you should remove the shared memory in some other way since it will persist.

Adapting the example from the documentation, it could look something like this:

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <cstring>
#include <cstdlib>
#include <string>

constexpr auto shm_name = "SharedMemoryCLI";
using namespace boost::interprocess;

auto create_shared_memory() {
    // Compute your data and calculate the size needed:
    shared_memory_object shm {create_only, shm_name, read_write};

    // Either use an upper bound for the size needed or compute your data before.
    shm.truncate(data_size);

    //Map the whole shared memory in this process
    mapped_region region{shm, read_write};

    // Either write your data directly to region.get_address():
    compute_data_to(region.get_address());
    // Or have the data already computed and memcopy it:
    std::memcpy(region.get_address(), data_ptr, data_size);
    return region;
}

auto obtain_memory_region() {
   try {
      shared_memory_object shm{open_only, shm_name, read_only};
      return mapped_region {shm, read_only};
   } catch(const std::exception &er) {
      // One should probably check if this is the "right" exception but boost does not say what type it uses here...
      return create_shared_memory();
   }
}

int main(int argc, char *argv[])
{
   region = obtain_memory_region();
   static_cast<char*>(region.get_address()); // can be used as a to your data.
   return 0;
}

Note that you maybe have to persist the exact size of your shared memory in some other way (or maybe just as the first 8 byte of the region). You can then have to somehow get the char* back to your wanted type, but I think that a reinterpret_cast should work here.

The above code is not tested and I give no guarantees but I am pretty confident that it should work roughly in this way and be about as fast as just sharing the pointer (if that would work). You really should read the entirety of https://www.boost.org/doc/libs/1_48_0/doc/html/interprocess/sharedmemorybetweenprocesses.html before applying this in any way.

n314159
  • 4,990
  • 1
  • 5
  • 20