Protobuf object gets corrupted when passed to a dynamic library

Question

I'm working on the integration of my library with some deep learning framework and I encountered some memory issues. I suspect that protobuf is the problem here but I wanted to ask you guys for opinion and some help because I had spent too much time on it already. In short the framework operates on deep learning models in ONNX format. It reads them into memory to onnx::ModelProto objects. Those objects are then passed to my library where they get transformed (and optimized) to my custom representation and returned back to the framework. onnx::ModelProto is a C++ class generated with protoc from https://github.com/onnx/onnx/blob/master/onnx/onnx.proto - a regular protobuf message.

The problem occurs when the ModelProto reaches my library. The main member of the ModelProto is the graph, which is a pointer: onnx::GraphProto* onnx::ModelProto::graph_. When the object is passed to my library, the graph pointer is set to some different address which is not a proper GraphProto object location:

framework:
model_proto: 0x2ccb450
graph address: 0x2cc1d20
---
mylib:
model_proto: 0x2ccb450
graph address: 0x7fb6529c2560

The annoying thing is that it only happens in Release builds. When I compile both in debug - it works correctly.

Also, before this error popped up, I was passing the ModelProto object to my library using the std::stringstream - I first serialized the model in the framework to string, created a stream out of it and deserialized in my library. The graph was getting corrupted too just after the deserialization finished and it was so bad that I was getting segfaults further down in my code.

Could this have anything to do with the fact that both the framework and my library link statically with their own copies of protobuf? Protobuf is added as a dependency and compiled with both the framework and my library. I made sure that I use the same version (it's 3.11 at the moment). I also use the same ONNX version (1.6).

Here's how the dependencies and the workflow look:

https://stackoverflow.com/questions/22797418/how-do-i-safely-pass-objects-especially-stl-objects-to-and-from-a-dll — Jeffrey, Apr 28 '20 at 13:09

score 2 · Answer 1 · answered Apr 28 '20 at 13:14

2

Since there is no standard ABI in C++, the bar for passing objects between libraries built separately is quite high.

The whole reason for using protobuf is to convert the objects to strings and then exchange those character arrays between the two endpoints. That way you resolve all the issues around object having different layouts, formats, precisions, endianness.

If you absolutely want to pass pointers around, the build settings must be identical. Everything. All compiler and linker versions, settings, all #defines, optimization levels, etc... It's a path that is very tough to follow and makes for a brittle solution.

answered Apr 28 '20 at 13:14

Jeffrey

11,063
1
21
42

Thanks, I'm aware of all of the possible ABI inconsistencies. Unfortunately there's this other problem I described when I serialize the model in the framework, and parse it from istream in my library. The memory corruption still happens and occasionally segfaults the whole process. – tomdol Apr 28 '20 at 13:33
1

Did you try passing a char* and making sure the other side does nothing more than read from it? Passing a stringstream comes with the same risks. – Jeffrey Apr 28 '20 at 13:56

score 1 · Answer 2 · answered Apr 28 '20 at 20:29

I think I've found a solution but I'm still not 100% certain what the root cause is.

ONNX library lets you customize the namespace in which the generated classes will reside https://github.com/onnx/onnx/blob/master/CMakeLists.txt#L76-L78

I set it to an arbitrary value in my lib and that finally fixed the problem. I've switched back to the istringstream version and it seems to work. It has passed many CI checks so things look good so far.

Protobuf object gets corrupted when passed to a dynamic library

2 Answers2