Determine cause of segfault when using -O3?

Question

I'm having trouble determining the cause of a segfault when a program is compiled with -O3 with GCC 4.8/4.9/5.1. For GCC 4.9.x, I've seen it on Cygwin, Debian 8 (x64) and Fedora 21 (x64). Others have experienced it on GCC 4.8 and 5.1.

The program is fine under -O2, fine with other versions of GCC, and fine under other compilers (like MSVC, ICC and Clang).

Below is the crash under GDB, but nothing is jumping out at me. The source code from misc.cpp:26 is below, but its a simple XOR:

((word64*)buf)[i] ^= ((word64*)mask)[i];

The code in question checks for 64-bit word alignment prior to the cast. From the disassembly under -O3, I know it has something to do with the vmovdqa instruction:

(gdb) disass 0x0000000000539fc3
...

   0x0000000000539fbc <+220>:   vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0
=> 0x0000000000539fc3 <+227>:   vmovdqa %ymm0,0x0(%r13,%r10,1)
   0x0000000000539fca <+234>:   add    $0x20,%r10

It appears GCC is using SSE vectors at -O3, and not using them at -O2. (Thanks to Alejandro for the suggestion).

I'm going to naively ask: does vmovdqa have alignment requirements greater than 64-bit word? Is so, why is GCC selecting it when the words are not 128-bit aligned?

What is causing the segfault here? How do I troubleshoot it further?

Also see Bug 66852 - vmovdqa instructions issued on 64-bit aligned array, causes segfault. It was filed in response to this issue, so its unconfirmed at the moment.

$ gdb ./cryptest.exe 
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
...
(gdb) r v
...
Testing MessageDigest algorithm SHA-3-224.
.....
Program received signal SIGSEGV, Segmentation fault.
0x0000000000539fc3 in CryptoPP::xorbuf (buf=0x98549a "efghijde", 
    mask=mask@entry=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., count=count@entry=0x5e) at misc.cpp:26
26                  ((word64*)buf)[i] ^= ((word64*)mask)[i];

(gdb) where
#0  0x0000000000539fc3 in CryptoPP::xorbuf (buf=0x98549a "efghijde", 
    mask=mask@entry=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., count=count@entry=0x5e) at misc.cpp:26
#1  0x0000000000561eb0 in CryptoPP::SHA3::Update (this=0x985480, 
    input=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., 
    length=0x5e) at sha3.cpp:264
#2  0x00000000005bac1a in CryptoPP::HashVerificationFilter::NextPutMultiple (
    this=0x7fffffffd390, 
    inString=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., 
    length=0x5e) at filters.cpp:786
#3  0x00000000005bd8a2 in NextPutMaybeModifiable (modifiable=<optimized out>, 
    length=0x5e, 
    inString=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., 
    this=0x7fffffffd390) at filters.h:200
#4  CryptoPP::FilterWithBufferedInput::PutMaybeModifiable (
    this=0x7fffffffd390, 
    inString=0x7fffffffbfeb "efghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu", 'a' <repeats 106 times>..., 
    length=<optimized out>, messageEnd=0x0, blocking=<optimized out>, 
...

-O3 disassembly and register values.

(gdb) disass 0x0000000000539fc3
Dump of assembler code for function CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long):
   0x0000000000539ee0 <+0>: lea    0x8(%rsp),%r10
   0x0000000000539ee5 <+5>: and    $0xffffffffffffffe0,%rsp
   0x0000000000539ee9 <+9>: mov    %rdx,%rax
   0x0000000000539eec <+12>:    pushq  -0x8(%r10)
   0x0000000000539ef0 <+16>:    push   %rbp
   0x0000000000539ef1 <+17>:    shr    $0x3,%rax
   0x0000000000539ef5 <+21>:    mov    %rsp,%rbp
   0x0000000000539ef8 <+24>:    push   %r15
   0x0000000000539efa <+26>:    push   %r14
   0x0000000000539efc <+28>:    push   %r13
   0x0000000000539efe <+30>:    push   %r12
   0x0000000000539f00 <+32>:    push   %r10
   0x0000000000539f02 <+34>:    push   %rbx
   0x0000000000539f03 <+35>:    je     0x53a00a <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+298>
   0x0000000000539f09 <+41>:    lea    0x20(%rdi),%rcx
   0x0000000000539f0d <+45>:    cmp    %rcx,%rsi
   0x0000000000539f10 <+48>:    lea    0x20(%rsi),%rcx
   0x0000000000539f14 <+52>:    setae  %r8b
   0x0000000000539f18 <+56>:    cmp    %rcx,%rdi
   0x0000000000539f1b <+59>:    setae  %cl
   0x0000000000539f1e <+62>:    or     %cl,%r8b
   0x0000000000539f21 <+65>:    je     0x53a300 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+1056>
   0x0000000000539f27 <+71>:    cmp    $0x8,%rax
   0x0000000000539f2b <+75>:    jbe    0x53a300 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+1056>
   0x0000000000539f31 <+81>:    mov    %rdi,%rcx
   0x0000000000539f34 <+84>:    and    $0x1f,%ecx
   0x0000000000539f37 <+87>:    shr    $0x3,%rcx
   0x0000000000539f3b <+91>:    neg    %rcx
   0x0000000000539f3e <+94>:    and    $0x3,%ecx
   0x0000000000539f41 <+97>:    cmp    %rax,%rcx
   0x0000000000539f44 <+100>:   cmova  %rax,%rcx
   0x0000000000539f48 <+104>:   xor    %r8d,%r8d
   0x0000000000539f4b <+107>:   test   %rcx,%rcx
   0x0000000000539f4e <+110>:   je     0x539f80 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+160>
   0x0000000000539f50 <+112>:   mov    (%rsi),%r8
   0x0000000000539f53 <+115>:   xor    %r8,(%rdi)
   0x0000000000539f56 <+118>:   cmp    $0x1,%rcx
   0x0000000000539f5a <+122>:   je     0x53a371 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+1169>
   0x0000000000539f60 <+128>:   mov    0x8(%rsi),%r8
   0x0000000000539f64 <+132>:   xor    %r8,0x8(%rdi)
   0x0000000000539f68 <+136>:   cmp    $0x3,%rcx
   0x0000000000539f6c <+140>:   jne    0x53a366 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+1158>
   0x0000000000539f72 <+146>:   mov    0x10(%rsi),%r8
   0x0000000000539f76 <+150>:   xor    %r8,0x10(%rdi)
   0x0000000000539f7a <+154>:   mov    $0x3,%r8d
   0x0000000000539f80 <+160>:   mov    %rax,%r11
   0x0000000000539f83 <+163>:   xor    %r10d,%r10d
   0x0000000000539f86 <+166>:   sub    %rcx,%r11
   0x0000000000539f89 <+169>:   shl    $0x3,%rcx
   0x0000000000539f8d <+173>:   xor    %ebx,%ebx
   0x0000000000539f8f <+175>:   lea    -0x4(%r11),%r9
   0x0000000000539f93 <+179>:   lea    (%rdi,%rcx,1),%r13
   0x0000000000539f97 <+183>:   shr    $0x2,%r9
   0x0000000000539f9b <+187>:   add    %rsi,%rcx
   0x0000000000539f9e <+190>:   add    $0x1,%r9
   0x0000000000539fa2 <+194>:   lea    0x0(,%r9,4),%r12
   0x0000000000539faa <+202>:   add    $0x1,%rbx
   0x0000000000539fae <+206>:   vmovdqu (%rcx,%r10,1),%xmm0
   0x0000000000539fb4 <+212>:   vinsertf128 $0x1,0x10(%rcx,%r10,1),%ymm0,%ymm0
   0x0000000000539fbc <+220>:   vxorps 0x0(%r13,%r10,1),%ymm0,%ymm0
=> 0x0000000000539fc3 <+227>:   vmovdqa %ymm0,0x0(%r13,%r10,1)
   0x0000000000539fca <+234>:   add    $0x20,%r10
   0x0000000000539fce <+238>:   cmp    %r9,%rbx
   0x0000000000539fd1 <+241>:   jb     0x539faa <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+202>
   0x0000000000539fd3 <+243>:   lea    (%r8,%r12,1),%rcx
   0x0000000000539fd7 <+247>:   cmp    %r12,%r11
   0x0000000000539fda <+250>:   je     0x53a006 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+294>
   0x0000000000539fdc <+252>:   mov    (%rsi,%rcx,8),%r8
   0x0000000000539fe0 <+256>:   xor    %r8,(%rdi,%rcx,8)
   0x0000000000539fe4 <+260>:   lea    0x1(%rcx),%r8
   0x0000000000539fe8 <+264>:   cmp    %r8,%rax
   0x0000000000539feb <+267>:   jbe    0x53a006 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+294>
   0x0000000000539fed <+269>:   add    $0x2,%rcx
   0x0000000000539ff1 <+273>:   mov    (%rsi,%r8,8),%r9
   0x0000000000539ff5 <+277>:   xor    %r9,(%rdi,%r8,8)
   0x0000000000539ff9 <+281>:   cmp    %rcx,%rax
   0x0000000000539ffc <+284>:   jbe    0x53a006 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+294>
   0x0000000000539ffe <+286>:   mov    (%rsi,%rcx,8),%r8
   0x000000000053a002 <+290>:   xor    %r8,(%rdi,%rcx,8)
   0x000000000053a006 <+294>:   shl    $0x3,%rax

And:

(gdb) info r ymm0 r13 r10
ymm0           {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v4_double = {0x8000000000000000, 0x8000000000000000, 0x8000000000000000, 
    0x8000000000000000}, v32_int8 = {0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x65, 
    0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x66, 0x67, 0x68, 0x69, 0x6a, 
    0x6b, 0x6c, 0x6d, 0x67, 0x68, 0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x68, 
    0x69}, v16_int16 = {0x6766, 0x6968, 0x6b6a, 0x6665, 0x6867, 0x6a69, 
    0x6c6b, 0x6766, 0x6968, 0x6b6a, 0x6d6c, 0x6867, 0x6a69, 0x6c6b, 0x6e6d, 
    0x6968}, v8_int32 = {0x69686766, 0x66656b6a, 0x6a696867, 0x67666c6b, 
    0x6b6a6968, 0x68676d6c, 0x6c6b6a69, 0x69686e6d}, v4_int64 = {
    0x66656b6a69686766, 0x67666c6b6a696867, 0x68676d6c6b6a6968, 
    0x69686e6d6c6b6a69}, v2_int128 = {0x67666c6b6a69686766656b6a69686766, 
    0x69686e6d6c6b6a6968676d6c6b6a6968}}
r13            0x9854a2 0x9854a2
r10            0x0  0x0

When compiled with -O2 and a breakpoint on the line in question, here's the disassembly. ((word64*)buf)[i] ^= ((word64*)mask)[i]; moved to line 31:

Breakpoint 1, CryptoPP::xorbuf (buf=0x985488 "", 
    mask=mask@entry=0x7fffffffc01d "The quick brown fox", 'a' <repeats 181 times>..., count=count@entry=0x13) at misc.cpp:31
31                  ((word64*)buf)[i] ^= ((word64*)mask)[i];
(gdb) disass
Dump of assembler code for function CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long):
   0x0000000000532150 <+0>: mov    %rdx,%rcx
   0x0000000000532153 <+3>: shr    $0x3,%rcx
   0x0000000000532157 <+7>: je     0x532170 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+32>
   0x0000000000532159 <+9>: xor    %eax,%eax
=> 0x000000000053215b <+11>:    mov    (%rsi,%rax,8),%r8
   0x000000000053215f <+15>:    xor    %r8,(%rdi,%rax,8)
   0x0000000000532163 <+19>:    add    $0x1,%rax
   0x0000000000532167 <+23>:    cmp    %rcx,%rax
   0x000000000053216a <+26>:    jne    0x53215b <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+11>
   0x000000000053216c <+28>:    shl    $0x3,%rcx
   0x0000000000532170 <+32>:    sub    %rcx,%rdx
   0x0000000000532173 <+35>:    je     0x5321d0 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+128>
   0x0000000000532175 <+37>:    mov    %rdx,%r8
   0x0000000000532178 <+40>:    add    %rcx,%rdi
   0x000000000053217b <+43>:    add    %rcx,%rsi
   0x000000000053217e <+46>:    shr    $0x2,%r8
   0x0000000000532182 <+50>:    je     0x5321a8 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+88>
   0x0000000000532184 <+52>:    xor    %eax,%eax
   0x0000000000532186 <+54>:    nopw   %cs:0x0(%rax,%rax,1)
   0x0000000000532190 <+64>:    mov    (%rsi,%rax,4),%ecx
   0x0000000000532193 <+67>:    xor    %ecx,(%rdi,%rax,4)
   0x0000000000532196 <+70>:    add    $0x1,%rax
   0x000000000053219a <+74>:    cmp    %r8,%rax
   0x000000000053219d <+77>:    jne    0x532190 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+64>
   0x000000000053219f <+79>:    shl    $0x2,%r8
   0x00000000005321a3 <+83>:    sub    %r8,%rdx
   0x00000000005321a6 <+86>:    je     0x5321d8 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+136>
   0x00000000005321a8 <+88>:    lea    (%rdi,%r8,1),%rcx
   0x00000000005321ac <+92>:    xor    %eax,%eax
   0x00000000005321ae <+94>:    lea    (%rsi,%r8,1),%rdi
   0x00000000005321b2 <+98>:    nopw   0x0(%rax,%rax,1)
   0x00000000005321b8 <+104>:   movzbl (%rdi,%rax,1),%esi
   0x00000000005321bc <+108>:   xor    %sil,(%rcx,%rax,1)
   0x00000000005321c0 <+112>:   add    $0x1,%rax
   0x00000000005321c4 <+116>:   cmp    %rdx,%rax
   0x00000000005321c7 <+119>:   jb     0x5321b8 <CryptoPP::xorbuf(unsigned char*, unsigned char const*, unsigned long)+104>
   0x00000000005321c9 <+121>:   retq   
   0x00000000005321ca <+122>:   nopw   0x0(%rax,%rax,1)
   0x00000000005321d0 <+128>:   retq   
   0x00000000005321d1 <+129>:   nopl   0x0(%rax)
   0x00000000005321d8 <+136>:   retq   
End of assembler dump.

From misc.cpp, line 26 is ((word64*)buf)[i] ^= ((word64*)mask)[i];.

void xorbuf(byte *buf, const byte *mask, size_t count)
{
    size_t i;

    if (IsAligned<word32>(buf) && IsAligned<word32>(mask))
    {
        if (!CRYPTOPP_BOOL_SLOW_WORD64 && IsAligned<word64>(buf) && IsAligned<word64>(mask))
        {
            for (i=0; i<count/8; i++)
                ((word64*)buf)[i] ^= ((word64*)mask)[i];
            count -= 8*i;
            if (!count)
                return;
            buf += 8*i;
            mask += 8*i;
        }

        for (i=0; i<count/4; i++)
            ((word32*)buf)[i] ^= ((word32*)mask)[i];
        count -= 4*i;
        if (!count)
            return;
        buf += 4*i;
        mask += 4*i;
    }

    for (i=0; i<count; i++)
        buf[i] ^= mask[i];
}

How much space does `buf` point to? Have you made sure you're not simply writing off the end? It's seems suspiciously like it's an 8-char long cstring...strange that the error seems to occur just as it's about to write 8 bytes starting at what could be the null terminator... — user657267, Jul 13 '15 at 00:38
@user657267 - Yes, the code is respecting buffer sizes. I also have processes in place to validate it during testing. Namely, Clang sanitizers and Valgrind. I'm getting ready to add Coverity since this is an Open Source project. (I adore static and dynamic analysis). — jww, Jul 13 '15 at 01:05
@jww Can you disable the early blocks completely and leave only the last "byte-at-a-time" loop at the end? Then compile with with `-O3` again? Maybe it's not the function itself doing something strange - maybe the way the arguments got to it is not right. — viraptor, Jul 13 '15 at 01:15
@jww, Could you post the assembly output for `-O2`? I think it'd be easier to see what could be going on between the two optimization levels. — Alejandro, Jul 13 '15 at 01:38
Compile with all the warnings turned on and errors treated as warnings. Once you have fixed all these you will have a better idea of where you error is `-Werror -Wall -Wextra -pedantic` — Martin York, Jul 13 '15 at 02:04
@Alejandro - you were correct. At `-O2`, the SSE instructions were not used. The disassembly was added to the question. — jww, Jul 13 '15 at 02:57
`-O3` enables auto-vectorization and `-O2` does not. I wonder if this has to do with the strict aliasing rule. `buf` and `mask` are byte pointers but then you tell GCC that `buf` and `mask` are `word64` or `word32` pointers which would violate the strict aliasing rule and GCC can then assume they don't reference the same memory. You could try a union instead. — Z boson, Jul 13 '15 at 07:58
In case you haven't already found this out: yes, 256b `vmovdqa` (with a `ymm` arg) to/from mem faults when the memory address isn't aligned to 32B. `vmovdqu` has no alignment requirements, and runs as fast when addresses actually aligned (and slower if the data spans a cache-line boundary.) `vmovdqa` is useful when you'd rather fault than have a usually-minor performance problem go unnoticed. — Peter Cordes, Jul 15 '15 at 02:49

Basile Starynkevitch · Accepted Answer · 2015-07-13T10:40:09.067

7

You could compile with g++ -Wall -Wextra -O3 -g ; you want to enable warnings, because some of them are possibly generated only in GCC passes enabled with -O3; you want to enable debugging info (-g) to use gdb but be aware that debugging info are not always reliable with strong optimizations.

You might have some pointer aliasing issues. Perhaps use (or remove) the restrict keyword.

Be sure to avoid undefined behavior. You might use -fsanitize= options (notably -fsanitize=address and -fsanitize=undefined....) to the g++ compiler (version 5 preferably) Use also valgrind.

BTW, you could use dump options like -fdump-tree-all (warning, they produces hundreds of files!) to understand more the internal behavior of g++; and you might even customize your GCC compiler with MELT.

Also, if looking at the produced assembler, compile with g++ -Wall -S -O3 -fverbose-asm since the -fverbose-asm asks GCC to emit some assembler comments "explaining" (not much, but a tiny bit) the compiled code.

edited Jul 13 '15 at 10:40

answered Jul 13 '15 at 08:26

Basile Starynkevitch

223,805
18
296
547

Yeah, I really like Clang and its sanitizers. I know there's undefined behavior due to a define called *`ALLOW_UNALIGNED_DATA_ACCESS`* in effect on i386 and x86_64. But it does not explain the use of `vmovdqa` for 64-bit words. – jww Jul 13 '15 at 08:32
Yes, by definition undefined behavior explains bizarre compilation output. – Basile Starynkevitch Jul 13 '15 at 08:32
Oh, that's right. Sorry about that. I have been using the Undefined Behavior sanitizer since [John Regehr and Peng Li created the patch for Clang 3.1](http://blog.regehr.org/archives/905). Its nice to see the rest of the world catching up :) – jww Jul 13 '15 at 08:51
Basile, I'm going to accept this because I think your answer is the best I can do. And I think there's a issue with GCC and selecting `vmovdqa` for an array that's only aligned to *double words* and not *double quad words*. – jww Jul 13 '15 at 08:56
2

@jww, I'm not sure you can blame GCC if you are invoking UB due to the strict aliasing rule. I would try a union and see if the problem goes away. – Z boson Jul 13 '15 at 09:09
@Zboson - C++ does not allow unions for that purpose. That's a C thing. – jww Jun 01 '16 at 20:15
@BasileStarynkevitch - the problem was the `vmovdqa` being used on data that was not aligned. I'm guessing GCC removed the necessary checks under the ***as-if*** rule. They should call it the ***as-if-it-was-broken*** rule because anything that's well specified and articulated can be transformed into something that does not work. Its been a constant source of problems for us, including a CVE. – jww Jun 01 '16 at 20:18

Determine cause of segfault when using -O3?

1 Answers1

Linked