4

An error related to protobuf3

I have a project that have an c++ executable core, and several shared objects (.so, .dll) called plugins. When the core launches, it will load those plugins with dlopen. The core and plugins using protobuf as communication protocol, so they have to compile the generated .pb.cc and .ph.h files into their binaries to have the copy of the serializer/deserializer. And libprotobuf.so link to both the core and plugins. When I launch the core, it crushes with error: file already exists in database, same error in #863

I'm using protobuf-3 beta2, and Ubuntu 14.04. This error only happens on Linux. The program works fine on Windows and OS X.

I have also tried another way which compile all the generated protobuf files into a dynamic library (protocol.so), then the core and plugins were linked to protocol.so and libprotobuf.so. This works fine. Of course, because in #1062 the bug has been fixed. But when I changed the protocol.so into protocol.a, it failed again. I think it is same as compile generated .pb.cc separately.

I don't want to compile a protocol.so, because it is inconvenient for me to extend the communication protocol when I add more and more plugins. I think compile the generated .pb.cc into the plugin's binary is better (this work well on windows and OS X).

Any suggestions to fix this error are appreciated.

piaoxu
  • 43
  • 1
  • 1
  • 4

4 Answers4

6

The problem happens when you have multiple compiled copies of the same .pb.cc file sharing a single copy of libprotobuf.so. There are two ways to avoid this:

  1. The way you already found: factor out the .pb.cc files into a shared library.
  2. Link a separate copy of libprotobuf into each plugin. You'll need to use static linking for this library, i.e. use libprotobuf.a rather than libprotobuf.so. Note that with this option, it is unsafe to pass a pointer to a protobuf class between the plugins and the base application, because they are using separate copies of the protobuf library, which can lead to crashes. You will have to pass serialized messages as byte blobs instead. Luckily, that's the whole point of protobuf.
Kenton Varda
  • 41,353
  • 8
  • 121
  • 105
  • Hi, thanks for your advice. I have tried the way 2, but I still got the same error. Do you have any opinion about that? – piaoxu May 07 '16 at 04:00
  • @piaoxu If you get the same error then somehow your plugins are still sharing the same copy of libprotobuf. You'll have to figure out why. Sorry, I don't have any guesses. :/ – Kenton Varda May 08 '16 at 01:46
  • 2
    @KentonVarda Not only do you have to link `libprotobuf.a` into each plugin, but also (due to the usual ELF symbol visibility rules) you have to *hide* all `libprotobuf.a` symbols in the plugin (either with `-fvisibility=hidden`, or with a linker version script). http://stackoverflow.com/a/37064043/50617 talks about this some more. – Employed Russian May 08 '16 at 20:45
  • @KentonVarda, Yes, you are right. They are still somehow sharing the same copy of libprotobuf. Thanks :) – piaoxu May 14 '16 at 15:12
  • I don't understand. If I share only a pointer and not a copy, won't all operations be called on in the plugin that created this plugin??? why is that unsafe? – Martin Kosicky Aug 08 '18 at 18:20
  • Hi @KentonVarda, I saw you are the author of descriptor logic in protobuf so you would be in a better position answer this. I wonder what happens if I disable this check in a custom built libprotobuf.so. With the assumption that when there is a duplicate file found, treat it as the same file compiled twice and don't update the descriptor database. Will anything break? – zqy Dec 04 '20 at 17:57
  • @zqy I authored it more than 10 years ago, so my memory may be rusty. But no, I don't think that will work. It's not just a descriptor that is placed in the global map, but also information about how to instantiate the type and how to implement reflection on the type. This information is only compatible if the compiled code is identical. It might often appear to work, but it's hard to say what subtle bugs you might run into if things are slightly off. – Kenton Varda Dec 06 '20 at 20:54
  • @KentonVarda, Thanks for the answer, especially after such a long time. – zqy Dec 08 '20 at 00:31
1

I was able to get around this problem by adding RTLD_GLOBAL to dlopen which takes existing known symbols into account.

1

I solved this problem by adding RTLD_DEEPBIND to dlopen.

spxcds
  • 21
  • 4
0

In my case, I was getting the "File already exists in database" error when trying to run a plugin in Gazebo with the tutorial project.

I was able to solve this by copying the .so file to my local directory where I was trying to launch the program instead of setting a GAZEBO_PLUGIN_PATH variable to the build directory.

I hope a similar solution will work for others when facing problems other than Gazebo plugins. (maybe the general solution is to copy your .so file to the local instead of build)