0

Here is a C program that takes a 64 bit integer and divides it by the maximum number a 64 bit integer can hold, to get a double in [0, 1] (technically undefined behaviour as conversions are implementation defined and very large 64-bit integers cannot be held by a double). Compiling

#include <stdint.h>

double convert(uint64_t num)
{
    return (double) num / 0xFFFFFFFFFFFFFFFF;
}

with gcc 10.3.0 for the x86_64 architecture with either the -O3 or -O2 flag gives the following output, and I don't understand it. So I would appreciate it if someone could walk me through it.

convert:
    test    rdi, rdi
    js      .L2
    pxor    xmm0, xmm0
    cvtsi2sd        xmm0, rdi
    mulsd   xmm0, QWORD PTR .LC0[rip]
    ret
.L2:
    mov     rax, rdi
    and     edi, 1
    pxor    xmm0, xmm0
    shr     rax
    or      rax, rdi
    cvtsi2sd        xmm0, rax
    addsd   xmm0, xmm0
    mulsd   xmm0, QWORD PTR .LC0[rip]
    ret
.LC0:
    .long   0
    .long   1005584384

Here are some specific things I do not understand, but now that I have written it out, it is basically everything except for the cvtsi2sd instructions and setting the xmm0 register to 0.

The first thing I find strange is the .L2 label. As the contents of rdi is equal to itself, the first line will always set the sign flag to 1, meaning the jump of js will always occur right? So then why not remove the first two lines and put the contents of .L2 there?

Then I also don't understand why we put 1 in the edi register, and I can't find anywhere what the shr operations does when there is no second argument.

Then we have some xmulsd xmm0, QWORD PTR .LC0[rip]. I don't understand why we use the instruction pointer, and 10005584384 looks like a magic number to me.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 3
    `test` does a bitwise and of the arguments, not a comparison. The extra code path is needed because `cvtsi2sd` does a signed conversion but you provide an unsigned argument. `shr` with only one argument just shifts by 1. – fuz Oct 09 '21 at 08:32
  • You know `0xFFFFFFFFFFFFFFFF` isn't exactly representable as a `double`, right? The implicit conversion to `double` to match the other side of the `/` operator will round it up by 1 to `18446744073709551616.0`. (And after that, the reciprocal of that nice round number is exact, so the compiler can replace division with multiplication per the as-if rule, because it doesn't change the numerical result.) – Peter Cordes Oct 09 '21 at 08:36
  • @fuz Thanks. So test sets the sign flag to 1 if the sign-bit of rdi (the input) is one, in which case the jump occurs and shr together with the or rax, rdi kills the sign-bit? Also, I think this only works when the sign bit is on the right, does x86_64 always use big-endian? – asdfldsfdfjjfddjf Oct 09 '21 at 08:39
  • So that's why it's a multiply by a `double` whose mantissa is all zero (the first `.long` and part of the next: [Understanding GCC's floating point constants in assembly listing output](https://stackoverflow.com/q/51883374)). It's RIP-relative because that's how x86-64 accesses static data most efficiently; [Why does this MOVSS instruction use RIP-relative addressing?](https://stackoverflow.com/q/44967075). That mulsd is totally independent of the unsigned->double conversion. So that's really a separate question. – Peter Cordes Oct 09 '21 at 08:42
  • 2
    @asdfldsfdfjjfddjf amd64 always uses little endian, but that doesn't matter for where the sign bit is (floating point number layout is independent of endianess). The code does some tricks to convert the number while not losing any precision. It does not just kill the sign bit. – fuz Oct 09 '21 at 08:49
  • *technically undefined behaviour* - nope. The range of `double` is wider than the range of `uint64_t`. Maybe you're thinking of the limit of range over which `double` can exactly represent every integer; about equal to the mantissa size. But inexact conversion (for values too large to be exactly representable) is well-defined to round according to the current rounding mode (default = nearest if you haven't used fenv.h to change it). – Peter Cordes Oct 09 '21 at 09:14

0 Answers0