0

I have written small C++ program under Linux that uses AVX instructions. But the program receives SIGKILL signal and exits. It seems that the reason is some illegal instruction or wrong value passed to an instruction. I would need to learn at what point of the program it receives SIGKILL (at what instruction at best). But, when I run my program in GDB, program exits at the moment when it receives SIGKILLand cannot be debugged. I tried to set:

    (gdb) handle SIGKILL stop print nopass

but program still receives SIGKILL and exits before I can debug it. Do you have any tip how to work with this?

faramir
  • 251
  • 4
  • 13
  • 4
    You *can't* catch the `SIGKILL` signal. See e.g. [`man 7 signal`](http://man7.org/linux/man-pages/man7/signal.7.html). – Some programmer dude Aug 15 '14 at 11:47
  • 1
    As for your problem, you should first of all build with extra warnings enabled (at least use the `-Wall` flag to GCC, I also recommend `-Wextra` and `-pedantic`). This should hopefully give you hints on places where you might have [undefined behavior](http://en.wikipedia.org/wiki/Undefined_behavior). Then run in a system such as [Valgrind](http://valgrind.org/), even if it doesn't catch just your problem, it might give you more information about weird things you do with memory and pointers. – Some programmer dude Aug 15 '14 at 11:52
  • 2
    @JoachimPileborg Or use the -fsanitize options, such as 'address' and 'undefined'. – edmz Aug 15 '14 at 12:21
  • Are you sure that an illegal machine instruction is sending exactly `SIGKILL`? According to [signal(7)](http://man7.org/linux/man-pages/man7/signal.7.html) it should be `SIGILL` -without any `K` .... !! – Basile Starynkevitch Aug 15 '14 at 12:48

1 Answers1

0

The signal(7) man page tells that on illegal machine instruction (or illegal opcode), the

   SIGILL        4       Core    Illegal Instruction

signal is sent. Notice that it is SIGILL not SIGKILL (the letter K makes a big difference).

Recall that not every processor know about AVX. (On those that don't support AVX, it probably is an illegal opcode). Type cat /proc/cpuinfo (or grep avx /proc/cpuinfo) to know if your's processor accepts it. See proc(5)

So I guess that your program is getting the SIGILL (not SIGKILL) signal.

You can catch SIGILL (but you cannot catch SIGKILL).

What I explained in this answer (about SIGSEGV) is applicable to SIGILL; with some very tricky machine and system specific C code, you might catch SIGILL and ensure that upon return from the signal handler (installed with SA_SIGINFO and deeply processing the ucontext_t* third argument) the execution continues outside of the illegal instructions.

And you could configure gdb to handle SIGILL wisely. Read the signals chapter of GDB documentation, you probably want the handle SIGILL and/or p $_siginfo command.

Community
  • 1
  • 1
Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • 2
    Thanks for the answer, but I really got SIGKILL with K. Code can be found in: https://github.com/hagrid-the-developer/Turgnaimh/blob/master/AVX/matrix.cpp . Under Mac Os X, it runs OK, under Linux, AVX part runs OK, but float part (matrix_mul_8x8_slow) receives SIGKILL. Float part without AVX part runs OK too. It looks to me like AVX part sets some CPU flag that confuses float part. I tried to supply additional warning flags to compiler, but it wrote only: `warning: ISO C++ does not support the ‘%m’ gnu_printf format [-Wformat]`. – faramir Aug 15 '14 at 17:45