I have a few utilities that promote 128 bit operations like 64x64 bit multiplications that I have successfully translated to the equivalent __uint128_t arithmetic with carry. I checked and gcc generates exactly the same assembly instructions.
However I am failing to find the equivalent C/C++ code for the case of 128-bit division. The original asm statement is this:
#include <stdint.h>
void div64(uint64_t low, uint64_t high, uint64_t divisor,
uint64_t& quotient, uint64_t& remainder) {
asm("divq %2" : "+a"(low), "+d"(high) : "rm"(divisor));
quotient = low;
remainder = high;
}
However when I try to replace it with this
void div64( uint64_t low, uint64_t high, uint64_t divisor,
uint64_t& quotient, uint64_t& remainder) {
__uint128_t val = __uint128_t(low) | (__uint128_t(high)<<64);
quotient = val/divisor;
remainder = val%divisor;
}
The compiler fails to match these operations to a single divq
instruction. Instead, it produces a much longer version with an additional library call to __udivti3, even in -O3
optimization mode.
div64b(unsigned long, unsigned long, unsigned long, unsigned long&, unsigned long&): # @div64b(unsigned long, unsigned long, unsigned long, unsigned long&, unsigned long&)
pushq %r15
pushq %r14
pushq %r12
pushq %rbx
pushq %rax
movq %r8, %r14
movq %rcx, %r15
movq %rdx, %r12
movq %rdi, %rbx
xorl %ecx, %ecx
callq __udivti3@PLT
movq %rax, (%r15)
imulq %r12, %rax
subq %rax, %rbx
movq %rbx, (%r14)
addq $8, %rsp
popq %rbx
popq %r12
popq %r14
popq %r15
retq
Am I missing anything?
Note: I tagged this question with both C
and C++
because the code above is valid on both languages.