1

I have a very simple c++ code (a minimal example of what I am actually doing) using sse2 intrinsics.

#include <xmmintrin.h>
int main(){
    __m128d a = {0,0};
    __m128d b = {1,1};
    __m128d c = a + b;
    int t = c[0] >= 1;
    return t;
}

I would like to check that the addition is indeed compiled to vectorized instructions. I compile the file with g++ -S test.cpp

My understanding of the thing is that if I don't put the msse2 flag to g++, sse2 is not enabled. It seems to be confirmed by the result of g++ -Q --help=target

  -msse                             [disabled]
  -msse2                            [disabled]
  -msse2avx                         [disabled]
  -msse3                            [disabled]
  -msse4                            [disabled]
  -msse4.1                          [disabled]
  -msse4.2                          [disabled]
  -msse4a                           [disabled]

However, when looking at the assembly code, the addpd instruction seems to be used.

main:
.LFB499:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    subq    $80, %rsp
    movq    %fs:40, %rax
    movq    %rax, -8(%rbp)
    xorl    %eax, %eax
    pxor    %xmm0, %xmm0
    movaps  %xmm0, -48(%rbp)
    movapd  .LC0(%rip), %xmm0
    movaps  %xmm0, -32(%rbp)
    movapd  -48(%rbp), %xmm0
    addpd   -32(%rbp), %xmm0
    movaps  %xmm0, -64(%rbp)
    movsd   -64(%rbp), %xmm0
    pxor    %xmm1, %xmm1
    ucomisd %xmm1, %xmm0
    setnb   %al
    movzbl  %al, %eax
    movl    %eax, -68(%rbp)
    movl    -68(%rbp), %eax
    movq    -8(%rbp), %rdx
    xorq    %fs:40, %rdx
    je  .L3
    call    __stack_chk_fail
.L3:
    leave
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc
.LFE499:
    .size   main, .-main
    .section    .rodata
    .align 16
.LC0:
    .long   0
    .long   1072693248
    .long   0
    .long   1072693248
    .ident  "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609"
    .section    .note.GNU-stack,"",@progbits

I see a contradiction here, which makes me think that there is something I don't understand. Is sse2 enabled or not?

Raphael D.
  • 778
  • 2
  • 7
  • 18
  • 6
    In 64 bit mode it's always enabled. – Jester Aug 10 '18 at 15:12
  • Then I guess the problem is I don't really understand what `g++ -Q --help=target` does. – Raphael D. Aug 10 '18 at 15:13
  • 1
    The `--help=target` option only refers to the command line options you pass to gcc. See https://gcc.gnu.org/onlinedocs/gcc/Option-Index.html – Banex Aug 10 '18 at 15:13
  • Also you can use `-m32` to create 32 bit binaries. Not sure if the `-Q --help=target` considers that different. – Jester Aug 10 '18 at 15:14
  • Just to give some background I found it in the top answer here https://stackoverflow.com/questions/20150257/does-gcc-4-8-1-enable-sse-by-default – Raphael D. Aug 10 '18 at 15:15
  • 4
    sse2 isn't an extension on AMD64 its a part of the base instruction set so the "-msse" and "-msse2" parameters don't do anything, they may be off by default but that wont stop gcc generating SSE2 code – Alan Birtles Aug 10 '18 at 15:18

1 Answers1

1

I can't repro your results.

x86-64 g++ does enable -msse and -msse2. You can disable SSE code-gen in 64-bit mode with -mno-sse (even though SSE2 is baseline for x86-64), in which case gcc implements the + operator with x87 fld / faddp.

__m128d is defined as a GNU C native vector with two double elements, and you didn't use any intrinsics. If you'd used _mm_set_pd or _mm_add_pd instead of GNU-extension syntax which uses them as native vectors with {} braced init lists and the + operator, you'd get:

<source>:5:13: error: SSE register return with SSE disabled
     __m128d c = _mm_add_pd(a, b);

The interesting thing is that even with SSE2 disabled, it will still parse xmmintrin.h without error, but only at -O0. With optimization enabled it notices there are all these (inline) functions that return in an SSE register with SSE disabled even if you don't call them.

You could work around that by defining a vector type yourself like
typedef double v2d __attribute__((vector_size(16))).


On the Godbolt compiler explorer, gcc8.2 -m32 is configured with SSE2 enabled by default (even though SSE2 is not baseline for 32-bit in general).

But gcc6.3 -m32 doesn't enable SSE2 by default, as you can see in the -Q --help=target output.

No combination of anything I tried ever got gcc to emit addpd when SSE2 was disabled (either explicitly or simply not enabled with -m32). AFAIK, that would be a bug.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847