I've worked out a way to debug this issue analytically. In my case, I was cross-compiling for an older ABI, so apt-get wasn't an option and I was compiling all dependencies manually.
First let's take a look at what this issue actually is. In the Google GFlags library, flags are declared through global objects. When the global object's constructor is run, it calls into the GFlags library to register that command line flag. If the global constructor gets run multiple times (due to multiple versions of the library containing it being loaded into memory), then the GFlags register method dies with an error.
What does GLog have to do with this? Well, GLog uses GFlags, and it has globally declared flag objects. Even if GFlags is linked correctly, if the GLog library gets loaded multiple times, you get an error pointing to logging.cc in GLog.
Sounds like quite a mess, huh. Even if GLog and GFlags are linked as shared in most cases, if another library links to a static version or some other version, kaboom.
Luckily, we can debug this issue using GDB and other tools, if you're willing to delve through some tricky symbol analysis.
First, you'll want to run GDB on the Python interpreter when it tries to import caffe:
gdb --args python -c 'import caffe'
Now, run the program once through so that GDB can pick up all the libraries it imports:
(gdb) r
Now, we can set a breakpoint on the place in the function (FlagRegistry::RegisterFlag()
) that prints the error message, and run it again. Note that this line number is from my version of GFlags (2.2.2), you may have to look at the source code of your GFlags version and get the line number.
(gdb) break gflags.c:728
(gdb) r
Hopefully, GDB should then break on the first instance of the error (if not, check that gflags has been built with debugging symbols).
Look at the backtrace:
(gdb) bt
#0 google::(anonymous namespace)::FlagRegistry::RegisterFlag (this=0xa33b30, flag=0x1249d20) at dev/gflags-2.2.2/src/gflags.cc:728
#1 0x00007ffff0f3247a in _GLOBAL__sub_I_logging.cc () from prefix/lib/libcaffe2.so
#2 0x00007ffff7de76ca in call_init (l=<optimized out>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdb08, env=env@entry=0x7fffffffdb28) at dl-init.c:72
#3 0x00007ffff7de77db in call_init (env=0x7fffffffdb28, argv=0x7fffffffdb08, argc=3, l=<optimized out>) at dl-init.c:30
#4 _dl_init (main_map=main_map@entry=0xd9c2a0, argc=3, argv=0x7fffffffdb08, env=0x7fffffffdb28) at dl-init.c:120
#5 0x00007ffff7dec8f2 in dl_open_worker (a=a@entry=0x7fffffffcf70) at dl-open.c:575
#6 0x00007ffff7de7574 in _dl_catch_error (objname=objname@entry=0x7fffffffcf60, errstring=errstring@entry=0x7fffffffcf68, mallocedp=mallocedp@entry=0x7fffffffcf5f,
operate=operate@entry=0x7ffff7dec4e0 <dl_open_worker>, args=args@entry=0x7fffffffcf70) at dl-error.c:187
#7 0x00007ffff7debdb9 in _dl_open (file=0x9aee70 "prefix/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state.so", mode=-2147483646,
caller_dlopen=0x51bb39 <_PyImport_GetDynLoadFunc+233>, nsid=-2, argc=<optimized out>, argv=<optimized out>, env=0x7fffffffdb28) at dl-open.c:660
#8 0x00007ffff75ecf09 in dlopen_doit (a=a@entry=0x7fffffffd1a0) at dlopen.c:66
#9 0x00007ffff7de7574 in _dl_catch_error (objname=0xabf9f0, errstring=0xabf9f8, mallocedp=0xabf9e8, operate=0x7ffff75eceb0 <dlopen_doit>, args=0x7fffffffd1a0) at dl-error.c:187
#10 0x00007ffff75ed571 in _dlerror_run (operate=operate@entry=0x7ffff75eceb0 <dlopen_doit>, args=args@entry=0x7fffffffd1a0) at dlerror.c:163
#11 0x00007ffff75ecfa1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#12 0x000000000051bb39 in _PyImport_GetDynLoadFunc ()
<snip>
Well that's a lot to deal with, but let's focus on the line that's actually important:
#1 0x00007ffff0f3247a in _GLOBAL__sub_I_logging.cc () from prefix/lib/libcaffe2.so
This is the call to the constructor for the global variables in logging.cc (which is part of GLog). As you can see, this call is in libcaffe2.so, meaning that GLog has been statically linked to libcaffe2.so [I was using caffe2, but this procedure should be the same for both].
You can then set a breakpoint on google::(anonymous namespace)::FlagRegistry::RegisterFlag
and rerun the program from the start. Look at each call to RegisterFlag(), and figure out where this particular flag was registered the first time. If the library providing the flag is a shared library, then it should only ever get registered from that .so file, and nowhere else.
To confirm the diagnosis, you can use
nm <library> | grep _GLOBAL__sub_I_logging.cc
to check for that init function in a library file. Once you've found your culprit, you'll need to rebuild it so that it doesn't link to GFlags/GLog statically.