0

The program is randomly hitting some code scenario which causes it to core dump. I cant run gdb with breakpoints due to nature of application (it starts clients on LSF and actual work is done there). Core dump is probably happening at the client job.

I have a signal handler.. which catches the segmentation fault signal.. but what now? how do i find out where/what is causing the core dump. Also could be a silly question, has the core dump has already happened by the time my signal handler function catches it?

More info: I could analyze the core dump in gdb. it has given me a general idea of the function which is causing it. I thought it was the exit callback function registered with atexit() command. However: =>I am getting core dump even when I comment out entire contents of the function. =>If I comment out the exit() call in my signal handler there is no core dump. Not sure what to make out of it.

Below is the stack trace of core dump

#0  0x00007f70542cc9cd in __run_exit_handlers () from /lib64/xx.so.6
#1  0x00007f70542ccab5 in exit () from /lib64/xx.so.6
#2  0x00000000045df1fc in myExit (exit_code=-1, exit_type=_exitSignal) at abc.cxx:1506
#3  0x0000000004681c2a in my_bt_sighandler (sig=11, info=0x7f7053b55030, secret=0x7f7053b54f00)
    at def.cxx:1185
#4  <signal handler called>
#5  0x00007f70542cc9cd in __run_exit_handlers () from /lib64/libc.so.6
#6  0x00007f70542ccab5 in exit () from /lib64/libc.so.6
#7  0x00000000045df1fc in myExit (exit_code=-1, exit_type=_exitPreemption) at abc:1506
#8  0x0000000004681ec4 in my_bt_sighandler2 (sig=2, info=0x7f7053b55df0, secret=0x7f7053b55cc0)
    at def.cxx:1232
#9  <signal handler called>
#10 0x00007f70589d4943 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/yy.so.0
#11 0x0000000004cbc231 in DysEvx9NBX8GeHdP ()
#12 0x0000000004cbc3cc in fq08qJvrK1UBbNla ()
#13 0x00007f70589d0e25 in start_thread () from /lib64/yy.so.0
#14 0x00007f705438c34d in clone () from /lib64/libc.so.6

I have two functions associated with atexit() .. but I have made both of them dummy by commenting out their contents.. still getting core dump.

If i comment out the exit call in myExit, there is no core dump.

  void myExit(int exit_code, ExitType exit_type) {
    cout << "i am in myExit .. exit_code=" << exit_code << " ,  exit_type= " << exit_type << endl; 
    exit(exit_code);   //if i comment this out, there is no core dump
  }

What do you make out of it? Since exit_code is -1, the exit call becomes exit(-1), is that the problem? Why __run_exit_handlers are causing the core dump

dgarg
  • 318
  • 1
  • 3
  • 13
  • One idea could be record/replay debugging: https://rr-project.org/. Record the execution of the application until the core dump occurs and then replay the same scenario locally anytime you want – RoQuOTriX May 26 '21 at 08:47
  • 3
    If you have a core dump, you could try to analyze it with gdb, it should contain a stack trace at least. – Yksisarvinen May 26 '21 at 08:52
  • 1
    Yes, the cause of the core dump occurs before the SIGSEGV signal, and that happens before your signal handler catches it. A common trap that people often fall into with debugging is to commence stepping after the signal handler is called - which means you're stepping through code that is executed as a consequence of the fault, not that causes the fault. In complicated systems (and often even in simple ones) it is necessary to check relevant parameters (e.g. range checking) BEFORE doing operations that may fail - stepping through after the fault is caught is too late or too much after the cause – Peter May 26 '21 at 08:53
  • See https://stackoverflow.com/questions/15126925/debugging-child-process-after-fork-follow-fork-mode-child-configured and https://sourceware.org/gdb/onlinedocs/gdb/Forks.html for how to use GDB with your application. – Sneftel May 26 '21 at 09:11
  • I was able to analyze the core dump in gdb. it has given me a general idea of the function which is causing it. function free( ) is erroring out. Could it be caused by a .clear() of an undefined map? – dgarg May 26 '21 at 18:40

0 Answers0