1

I have two numbers, X and Y.

Y is a single unsigned integer primitive, e.g. long unsigned int. (In this case, there is no larger primitive to upcast to before performing the operation.)

X is represented by two primitives: X0 is the same type as Y and represents the low bits of X, and X1 is the same type and represents the high bits of X.

X / Y will always be representable using the same type as Y, i.e. the operation can be assumed not to overflow. (Because X is incidentally the product of two values of the same type as Y, one of which is less than or equal to Y.)

What is an efficient way to determine the result of this division?

btd
  • 684
  • 6
  • 15
  • just a thought. You may run with it: `c = hi_x / y * 2 ^ 64 + lo_x / y;` But you need to think if `hi_x / y` and `lo_x / y` are without rest. Can't think right now. It is too late. – bolov Feb 20 '17 at 17:25
  • I gave link to similar question with solution (as duplicate) in your other theme – MBo Feb 20 '17 at 17:45
  • 1
    "efficient way" efficient in speed? efficient in code? What platform? does platform contain a efficient `unsigned` divide, an efficient `unsigned` multiply. Is `unsigned long` twice `unsigned` width? Post what you have tried, even if is simply `x/y` and how does that compare in "efficiency" to alternate code. – chux - Reinstate Monica Feb 20 '17 at 18:08
  • You should post the inefficient methods that you have tried. – ad absurdum Feb 20 '17 at 18:18
  • May be this link? http://stackoverflow.com/questions/1870158/unsigned-128-bit-division-on-64-bit-machine – Jean-Baptiste Yunès Feb 20 '17 at 18:21

3 Answers3

3

You haven't specified the platform, which is crucial for the answer.

X / Y will always be representable using the same type as Y, i.e. the operation can be assumed not to overflow. (Because X is incidentally the product of two values of the same type as Y, one of which is less than or equal to Y.)

On the x86-64 architecture, you could take advantage of that fact, by dividing RDX:RAX pair, so it's actually the same as you would have one "glued" 128 bit register for the dividend. Beware, though, that if above invariant doesn't always hold, then you will get division exception from CPU.

That said, one implementation is to use inline assembly, e.g.:

/* divides x1:x0 pair by y, assumes that quotient <= UINT64_MAX  */
uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y)
{
    __asm__ (
        "divq\t%3"
        : "=a" (x0)
        : "0" (x0), "d" (x1), "rm" (y)
    );
    return x0;
}

which GCC 6.3.0 translates nicely (at -O1):

udiv128_64_unsafe:
        mov     rcx, rdx            ; place the y (divisor) in RCX 
        mov     rax, rdi            ; low part of the dividend (x0)
        mov     rdx, rsi            ; high part of the divided (x1)
        divq    rcx                 ; RAX = RDX:RAX / RCX
        ret                         ; RAX is return value

For instance, for X = 65454567423355465643444545, Y = 86439334393432232:

#include <stdio.h>
#include <inttypes.h>

uint64_t udiv128_64_unsafe(uint64_t x0, uint64_t x1, uint64_t y) { ... }

int main(void)
{
    printf("%" PRIu64 "\n", udiv128_64_unsafe(0x35c0ecb3fea1c941ULL, 0x36248bULL,
        86439334393432232ULL));
    return 0;
}

the given test driver program yields:

757231275
Grzegorz Szpetkowski
  • 36,988
  • 6
  • 90
  • 137
0

gcc has __int128 and unsigned __int128 for x86 architectures. I have successfully use it in the past to perform this kind of operations you describe. I am sure all major compilers have equivalents.

bolov
  • 72,283
  • 15
  • 145
  • 224
0

The "divide a 2 digit number by 1 digit, giving 1 digit quotient and remainder" is the basic primitive you need to synthesize larger divisions. If you don't have it (with digit == unsigned long int) available in your hardware, you need to use smaller digits.

In your case, split Y into 2 half-sized integers and X into 4 half-sized integers, and do the division that way.

Chris Dodd
  • 119,907
  • 13
  • 134
  • 226
  • This is the approach I've ended up pursuing - I'll post the solution when I've had a chance to finish the code. – btd Feb 20 '17 at 18:33
  • @Pineapple: Do you require a correctly rounded quotient? If not then Newton's method may be faster and easier than full multi-word division. Admittedly Knuth's algorithm D does simplify somewhat for this limited case but it is still a bit involved for my taste. – doynax Feb 20 '17 at 19:29