2

I'm trying to debug a program on a BeagleBone Black. Outside the debugger it produces an incorrect result but no SIGILL. It also runs OK under the debugger without a breakpoint. However it produces a SIGILL with a breakpoint set when stepping. The program and library does not use SIGILL-based cpu feature probes. However, I don't know what GDB is doing.

Under the debugger I am seeing:

(gdb) b main
Breakpoint 1 at 0x26f20: file test.cxx, line 22.
(gdb) r
Starting program: /home/cryptopp/test.exe

Breakpoint 1, main (argc=0x1, argv=0xbeffea54) at test.cxx:22
22          byte key[16] = {0};
(gdb) n
23          byte iv[12] = {0};
(gdb)
25          GCM<AES>::Encryption enc;
(gdb)
26          enc.SetKeyWithIV(key, 16, iv, 12);
(gdb)
28          std::string plain(0x00, 16);
(gdb)

Program received signal SIGILL, Illegal instruction.
0x00026d5c in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
    ()
(gdb) n
Single stepping until exit from function _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_,
which has no line number information.

Program terminated with signal SIGILL, Illegal instruction.
The program no longer exists.

And:

(gdb) shell echo _ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_ | c++filt
std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)

I tried searching for this issue, but I have not been able to locate a hit. I'm getting too much noise.

Why am I experiencing a SIGILL when GDB sets a breakpoint, and how do I work around it?


NEON is the problem I am trying to investigate. Here's the command line used for the program and library:

$ echo $CXXFLAGS
-DDEBUG -g3 -O0 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard
$ g++ $CXXFLAGS test.cxx ./libcryptopp.a -o test.exe

And:

$ gdb --version
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1

$ uname -a
Linux beaglebone 4.1.15-ti-rt-r40 #1 SMP PREEMPT RT Thu Jan 7 23:32:08 UTC 2016 armv7l GNU/Linux

$ cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 996.14
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc08
CPU revision    : 2

Hardware        : Generic AM33XX (Flattened Device Tree)
Revision        : 0000
Serial          : 0000000000000000

And:

Breakpoint 1, main (argc=0x1, argv=0xbeffea54) at test.cxx:22
22          byte key[16] = {0};
(gdb) n
23          byte iv[12] = {0};
(gdb)
25          GCM<AES>::Encryption enc;
(gdb)
26          enc.SetKeyWithIV(key, 16, iv, 12);
(gdb)
28          std::string plain(0x00, 16);
(gdb)

Program received signal SIGILL, Illegal instruction.
0x00026d5c in std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)
    ()

(gdb) up
#1  0x00026f82 in main (argc=0x1, argv=0xbeffea54) at test.cxx:28
28          std::string plain(0x00, 16);
(gdb) disass
Dump of assembler code for function main(int, char**):
   0x00026f10 <+0>:     push    {r4, r7, lr}
   0x00026f12 <+2>:     sub.w   sp, sp, #916    ; 0x394
   0x00026f16 <+6>:     add     r7, sp, #16
   0x00026f18 <+8>:     adds    r3, r7, #4
   0x00026f1a <+10>:    str     r0, [r3, #0]
   0x00026f1c <+12>:    mov     r3, r7
   0x00026f1e <+14>:    str     r1, [r3, #0]
   0x00026f20 <+16>:    add.w   r3, r7, #692    ; 0x2b4
   0x00026f24 <+20>:    movs    r2, #0
   0x00026f26 <+22>:    str     r2, [r3, #0]
   0x00026f28 <+24>:    adds    r3, #4
   0x00026f2a <+26>:    movs    r2, #0
   0x00026f2c <+28>:    str     r2, [r3, #0]
   0x00026f2e <+30>:    adds    r3, #4
   0x00026f30 <+32>:    movs    r2, #0
   0x00026f32 <+34>:    str     r2, [r3, #0]
   0x00026f34 <+36>:    adds    r3, #4
   0x00026f36 <+38>:    movs    r2, #0
   0x00026f38 <+40>:    str     r2, [r3, #0]
   0x00026f3a <+42>:    adds    r3, #4
   0x00026f3c <+44>:    add.w   r3, r7, #680    ; 0x2a8
   0x00026f40 <+48>:    movs    r2, #0
---Type <return> to continue, or q <return> to quit---
   0x00026f42 <+50>:    str     r2, [r3, #0]
   0x00026f44 <+52>:    adds    r3, #4
   0x00026f46 <+54>:    movs    r2, #0
   0x00026f48 <+56>:    str     r2, [r3, #0]
   0x00026f4a <+58>:    adds    r3, #4
   0x00026f4c <+60>:    movs    r2, #0
   0x00026f4e <+62>:    str     r2, [r3, #0]
   0x00026f50 <+64>:    adds    r3, #4
   0x00026f52 <+66>:    add.w   r3, r7, #240    ; 0xf0
   0x00026f56 <+70>:    mov     r0, r3
   0x00026f58 <+72>:    bl      0x2a804 <CryptoPP::GCM_Final<CryptoPP::Rijndael, (CryptoPP::GCM_TablesOption)0, true>::GCM_Final()>
   0x00026f5c <+76>:    add.w   r1, r7, #240    ; 0xf0
   0x00026f60 <+80>:    add.w   r2, r7, #692    ; 0x2b4
   0x00026f64 <+84>:    add.w   r4, r7, #680    ; 0x2a8
   0x00026f68 <+88>:    movs    r3, #12
   0x00026f6a <+90>:    str     r3, [sp, #0]
   0x00026f6c <+92>:    mov     r0, r1
   0x00026f6e <+94>:    mov     r1, r2
   0x00026f70 <+96>:    movs    r2, #16
   0x00026f72 <+98>:    mov     r3, r4
   0x00026f74 <+100>:   bl      0x2da0c <CryptoPP::SimpleKeyingInterface::SetKeyWithIV(unsigned char const*, unsigned int, unsigned char const*, unsigned int)>
---Type <return> to continue, or q <return> to quit---
   0x00026f78 <+104>:   add.w   r3, r7, #708    ; 0x2c4
   0x00026f7c <+108>:   mov     r0, r3
   0x00026f7e <+110>:   blx     0x26d58 <_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_+852>
=> 0x00026f82 <+114>:   add.w   r2, r7, #676    ; 0x2a4
   0x00026f86 <+118>:   add.w   r3, r7, #708    ; 0x2c4
   0x00026f8a <+122>:   mov     r0, r2
   0x00026f8c <+124>:   movs    r1, #0
   0x00026f8e <+126>:   movs    r2, #16
   ...
jww
  • 97,681
  • 90
  • 411
  • 885
  • "Why am I experiencing a SIGILL" -- because you have a bug somewhere in your code. "how do I work around it" -- find the bug, and fix it. Just because your program crashes at a particular point doesn't always mean that's where the bug is, so showing detailed debugger logs, at that point, won't be very useful. The bug can be anywhere in your code, which ends up corrupting memory, but execution continues until it blows up here as a result of the earlier memory corruption, wasting your time debugging perfectly working code. Your bug can be anywhere. Welcome to C++. – Sam Varshavchik Aug 12 '17 at 11:44
  • Thanks @Sam. The text section is read-only according to `objdump`. Are you claiming a wild write is occurring into the text section without a signal? Also, GDB leaves a lot to be desired on ARM.I would be especially interested in GDB bugs since a breakpoint moves the problem from a simple incorrect result to a crash. – jww Aug 12 '17 at 12:00
  • A wild write is happening somewhere else, that comes to light here. And as far as GDB goes, it is what it is. It's not perfect. – Sam Varshavchik Aug 12 '17 at 12:29
  • I think it's worth to run it under Valgrind. – ks1322 Aug 12 '17 at 12:57
  • Thanks @ks1322 - We are Valgrind clean. With `-g3 -O0` there are no findings, no uninitialized reads or writes, and no leaks. Are you aware of other debuggers available for a BeagleBone Black? I've been looking for a replacement debugger on ARM for about a years and a half. I really don't trust GDB under ARM anymore. I've had so many problems in the past I cringe when I have an ARM-specific or NEON-specific problem. – jww Aug 12 '17 at 13:06
  • 1
    The only alternative to gdb I know is lldb, but as far as I know it is untested for ARM on Linux, so it is probably even worse than gdb. – ks1322 Aug 12 '17 at 13:22
  • 1
    Possible duplicate of [gdb/ddd Program received signal SIGILL](https://stackoverflow.com/questions/15071625/gdb-ddd-program-received-signal-sigill) – Murphy Aug 12 '17 at 13:45
  • 1
    Thanks @Murphy. I don't believe its the same problem. OpenSSL is not being used in this question. The code in this question uses Crypto++. Crypto++ recently moved to `SIGILL`-free feature detection on Linux via `getauxval`. – jww Aug 12 '17 at 17:48
  • 1
    This could be Linux kernel bug affecting gdb behavior. There were some in the past: https://sourceware.org/ml/gdb/2012-01/msg00062.html, https://sourceware.org/bugzilla/show_bug.cgi?id=10833. – ks1322 Aug 12 '17 at 20:54
  • Thanks @ks1322. Yes, that looks like the same problem (from Comment 8 onwards). Thank you very much. You may as well answer since there's not much I can do about it. I'll file a Debian BTS issue. – jww Aug 12 '17 at 21:29
  • Thanks again @ks1322. It looks like it was the bug discussed in the report you referenced. I just updated to GDB 8.0 and the issue is no longer present. I'm going to toss you a [100 point bonus on another question](https://stackoverflow.com/a/45332984/608639) you answered since you seem too modest to answer here. You will be the only person I know to receive two bonuses for one question. If the badge exists, its probably rare. Thanks again. – jww Aug 13 '17 at 00:38
  • Thanks @jww. It was only a guess about Linux kernel bug, that's why I didn't answer. Still it is unclear whether it is kernel bug I referenced, gdb bug or some another kernel bug. Anyway good to know that it helped. – ks1322 Aug 13 '17 at 09:03

1 Answers1

5

Thanks to @ks1322, this is a known GDB/Kernel bug. See GDB crashes on debugging multithreaded program on ARM SMP dual core system in the GDB issue tracker.

According to the Debian BTS, this is also a known issue. See SIGILL when stepping through application on armhf in the Debian BTS.

The bug was refilled in hopes that it might actually be fixed sometime in the next year or two. See GDB Crash due to GDB/Kernel generated SIGILL

This is why I despise Debian's bug reporting systems. Stuff gets reported and then it just rots. Nothing gets fixed.

jww
  • 97,681
  • 90
  • 411
  • 885