I noticed that calculating the integer part of square root of uint64_t
is much more complicated than of int64_t
. Please, does anybody have an explanation for this? Why is it seemingly much more difficult to deal with one extra bit?
The following:
int64_t sqrt_int(int64_t a) {
return sqrt(a);
}
compiles with clang 5.0 and -mfpmath=sse -msse3 -Wall -O3
to
sqrt_int(long): # @sqrt_int(long)
cvtsi2sd xmm0, rdi
sqrtsd xmm0, xmm0
cvttsd2si rax, xmm0
ret
But the following:
uint64_t sqrt_int(uint64_t a) {
return sqrt(a);
}
compiles to:
.LCPI0_0:
.long 1127219200 # 0x43300000
.long 1160773632 # 0x45300000
.long 0 # 0x0
.long 0 # 0x0
.LCPI0_1:
.quad 4841369599423283200 # double 4503599627370496
.quad 4985484787499139072 # double 1.9342813113834067E+25
.LCPI0_2:
.quad 4890909195324358656 # double 9.2233720368547758E+18
sqrt_int(unsigned long): # @sqrt_int(unsigned long)
movq xmm0, rdi
punpckldq xmm0, xmmword ptr [rip + .LCPI0_0] # xmm0 = xmm0[0],mem[0],xmm0[1],mem[1]
subpd xmm0, xmmword ptr [rip + .LCPI0_1]
haddpd xmm0, xmm0
sqrtsd xmm0, xmm0
movsd xmm1, qword ptr [rip + .LCPI0_2] # xmm1 = mem[0],zero
movapd xmm2, xmm0
subsd xmm2, xmm1
cvttsd2si rax, xmm2
movabs rcx, -9223372036854775808
xor rcx, rax
cvttsd2si rax, xmm0
ucomisd xmm0, xmm1
cmovae rax, rcx
ret