Weird SSE assembler instructions for double negation

Question

GCC and Clang compilers seem to employ some dark magic. The C code just negates the value of a double, but the assembler instructions involve bit-wise XOR and the instruction pointer. Can somebody explain what is happening and why is it an optimal solution. Thank you.

Contents of test.c:

void function(double *a, double *b) {
    *a = -(*b); // This line.
}

The resulting assembler instructions:

(gcc)
0000000000000000 <function>:
 0: f2 0f 10 06             movsd  xmm0,QWORD PTR [rsi]
 4: 66 0f 57 05 00 00 00    xorpd  xmm0,XMMWORD PTR [rip+0x0]        # c <function+0xc>
 b: 00 
 c: f2 0f 11 07             movsd  QWORD PTR [rdi],xmm0
10: c3                      ret

(clang)
0000000000000000 <function>:
 0: f2 0f 10 06             movsd  xmm0,QWORD PTR [rsi]
 4: 0f 57 05 00 00 00 00    xorps  xmm0,XMMWORD PTR [rip+0x0]        # b <function+0xb>
 b: 0f 13 07                movlps QWORD PTR [rdi],xmm0
 e: c3                      ret

The assembler instruction at address 0x4 represents "This line", however I can't understand how it works. The xorpd/xorps instructions are supposed to be bit-wise XOR and PTR [rip] is the instruction pointer.

I suspect that at the moment of execution rip is pointing somewhere near the 0f 57 05 00 00 00 0f strip of bytes, but I can't quite figure out, how is this working and why do both compilers choose this approach.

P.S. I should point out that this is compiled using -O3

I can't reproduce this. Both compilers XOR with a constant representing -0.0, which they load from memory. That makes sense for [RIP-relative offsetting](https://stackoverflow.com/questions/44967075/why-does-this-movss-instruction-use-rip-relative-addressing/44967386#44967386). What doesn't make sense is that your disassembly shows them loading the bytes of the next instruction as the floating-point constant. That cannot be right. Is it possible that you somehow stripped an offset from the disassembly? Or that your disassembler is confused? — Cody Gray - on strike, Jul 17 '17 at 09:39
I am definitely not changing anything manually. With the above code verbatim and the commands `gcc test.c -c -O3 -o test.o` and `objdump -S -M intel test.o` this is the output I get. I will reiterate that I know what `PTR [rip]` is and that you can invert the sign by changing the leading bit in a double. The reason I am asking is because the two ideas don't mix in my head. — RuRo, Jul 17 '17 at 09:52
The more you are sure that you didn't make a mistake, the more likely you did. Copied the wrong part from console. Sorry the assembler listing should be the right one now. The strange `xor` with `rip` is still there though. — RuRo, Jul 17 '17 at 10:02
This looks like unlinked code, where offsets are not fixed up yet. Hence the `[rip + 0x0]`part. This is RIP-relative addressing alright, but the offset was not put in place by a linker yet. See the accepted answer, which obviously uses properly linked code. — Rudy Velthuis, Jul 17 '17 at 11:06

Sajad Banooie · Accepted Answer · 2017-07-17T11:18:02.863

for me the output of gcc with the -S -O3 options for the same code is:

    .file   "test.c"
    .text
    .p2align 4,,15
    .globl  function
    .type   function, @function
function:
.LFB0:
    .cfi_startproc
    movsd   (%rsi), %xmm0
    xorpd   .LC0(%rip), %xmm0
    movsd   %xmm0, (%rdi)
    ret
    .cfi_endproc
.LFE0:
    .size   function, .-function
    .section    .rodata.cst16,"aM",@progbits,16
    .align 16
.LC0:
    .long   0
    .long   -2147483648
    .long   0
    .long   0
    .ident  "GCC: (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406"
    .section    .note.GNU-stack,"",@progbits

here the xorpd instruction uses instruction pointer relative addressing with the offset which points to .LC0 label with the 64 bit value 0x8000000000000000(the 63rd bit is set to one).

.LC0:
    .long   0
    .long   -2147483648

if your compiler was big endian these lines where swaped.

xoring the double value with 0x8000000000000000 sets the sign bit(which is the 63rd bit) to one for a negative value.

clang uses xorps instruction for the same manner this xors the first 32bit of the double value.

if you run object dump with -r option it will show you the relocations that should be done on the program before running it.

objdump -d test.o -r

test.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <function>:
   0:   f2 0f 10 06             movsd  (%rsi),%xmm0
   4:   66 0f 57 05 00 00 00    xorpd  0x0(%rip),%xmm0        # c <function+0xc>
   b:   00 
            8: R_X86_64_PC32    .LC0-0x4
   c:   f2 0f 11 07             movsd  %xmm0,(%rdi)
  10:   c3                      retq   

Disassembly of section .text.startup:

0000000000000000 <main>:
   0:   31 c0                   xor    %eax,%eax
   2:   c3                      retq

here at <function + 0xb> we have a relocation of type R_X86_64_PC32.

PS: I'm using gcc 6.3.0

Correct, but not the 0th bit is set to one, the 63rd bit (the sign bit) is set to one, hence the hex value 0x8000000000000000. — Rudy Velthuis, Jul 17 '17 at 11:05

score 4 · Answer 2 · answered Jul 17 '17 at 09:38

4

xorps xmm0,XMMWORD PTR [rip+0x0]

Any part of an instruction surrounded by [] is an indirect reference to memory. In this case a reference to the memory at address RIP+0
(I doubt it is actually RIP+0, you might have edited the actual offset)

The X64 instruction set adds instruction pointer relative addressing. This means you can have (usually read-only) data in your program that you can address easily even if the program is moved around in memory.

A XOR xmm0,Y inverts all bits in xmm0 that are set in Y.
Negation involves inverting the sign bit, so that's why xor is used. Specifically xorpd/s because we are dealing with double resp. single floats.

answered Jul 17 '17 at 09:38

Johan

74,508
24
191
319

I am quite sure that I am not changing anything manually. Maybe, you can post the output you are getting so I can try to compare. – RuRo Jul 17 '17 at 09:53
4

@RuRo, you probably did not run `objdump` with the `--reloc` option. Then you get the raw instructions without the relocations which will be used to rewrite some of the arguments prior to execution. – Florian Weimer Jul 17 '17 at 10:25
@FlorianWeimer Damn. You're right. I just assumed that there are no relocations, since there is nothing to link. Our professor would be mad if he knew. We totally learned that in class. – RuRo Jul 17 '17 at 10:47
@RuRo: I like `alias disas='objdump -drwC -Mintel'` – Peter Cordes Jul 20 '17 at 02:51

Weird SSE assembler instructions for double negation

2 Answers2