1
  1. If there is a missing/corrupted library in the gdb core how do I isolate it?

  2. I also read that there is a possibility the thread could have overwritten its own stack , how do I detect that ?

how do I isolate the above problems with the below bt ?

/etc/gdb/gdbinit:105: Error in sourced command file:
Error while executing Python code.
Reading symbols from /opt/hsp/bin/addrman...done.

warning: Corrupted shared library list: 0x0 != 0x7c8d48ea8948c089

warning: Corrupted shared library list: 0x0 != 0x4ed700

warning: no loadable sections found in added symbol-file system-supplied DSO at
0x7ffd50ff6000
Core was generated by `addrman --notification-socket
/opt/hsp/sockets/memb_notify.socket'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000004759e4 in ps_locktrk_info::lktrk_locker_set (this=0x348,
locker_ip=<optimized out>) at ./ps/ps_lock_track.h:292
292     ./ps/ps_lock_track.h: No such file or directory.
(gdb) bt
#0  0x00000000004759e4 in ps_locktrk_info::lktrk_locker_set (this=0x348,
locker_ip=<optimized out>) at ./ps/ps_lock_track.h:292
#1  0x0000000000000000 in ?? ()
ajax_velu
  • 286
  • 3
  • 16

2 Answers2

0

It looks like the core file is corrupt, likely due to heap or stack corruption. Corruption is oftentimes the result of a buffer overflow or other undefined behavior.

If you are running on Linux, I would try valgrind. It can oftentimes spot corruption very quickly. Windows has some similar tools.

Yes, a multithreaded application can overflow the stack. Each thread is only allocated a limited amount. This usually only happens if you have very deep function call stack or you are allocating large local object on the stack.

Some interesting information here and here on setting the stack size for Linux applications.

Faced with your problem, I would:

  1. Check all the callers of the lktrk_locer_set method. Carefully investigate each, if possible, to see if there is obvious stack overflow or heap corruption
  2. Try to use Valgrind or similar tools to spot the issue
  3. Add debug logging to isolate the issue
Community
  • 1
  • 1
Matthew Fisher
  • 2,258
  • 2
  • 14
  • 23
0

warning: Corrupted shared library list: 0x0 != 0x7c8d48ea8948c089

The above error is usually a sign that you gave GDB different system libraries (or the main binary) from the ones used when the core dump was produced.

Either you are analyzing a "production" core dump on a development machine, or you've upgraded system libraries between the time core dump was produced and when you are analyzing it, or you've rebuilt the main binary.

See this answer for what to do if one of the above is correct.

Community
  • 1
  • 1
Employed Russian
  • 199,314
  • 34
  • 295
  • 362