You haven't specified the platform, which is crucial for the answer.
X / Y will always be representable using the same type as Y, i.e. the operation can be assumed not to overflow. (Because X is incidentally the product of two values of the same type
as Y, one of which is less than or equal to Y.)
On the x86-64 architecture, you could take advantage of that fact, by dividing RDX:RAX
pair, so it's actually the same as you would have one "glued" 128 bit register for the dividend. Beware, though, that if above invariant doesn't always hold, then you will get division exception from CPU.
That said, one implementation is to use inline assembly, e.g.:
/* divides x1:x0 pair by y, assumes that quotient <= UINT64_MAX */
uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y)
{
__asm__ (
"divq\t%3"
: "=a" (x0)
: "0" (x0), "d" (x1), "rm" (y)
);
return x0;
}
which GCC 6.3.0 translates nicely (at -O1
):
udiv128_64_unsafe:
mov rcx, rdx ; place the y (divisor) in RCX
mov rax, rdi ; low part of the dividend (x0)
mov rdx, rsi ; high part of the divided (x1)
divq rcx ; RAX = RDX:RAX / RCX
ret ; RAX is return value
For instance, for X = 65454567423355465643444545
, Y = 86439334393432232
:
#include <stdio.h>
#include <inttypes.h>
uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y) { ... }
int main(void)
{
printf("%" PRIu64 "\n", udiv128_64_unsafe(0x35c0ecb3fea1c941ULL, 0x36248bULL,
86439334393432232ULL));
return 0;
}
the given test driver program yields:
757231275