7

I have a 128-bit number stored as 2 64-bit numbers ("Hi" and "Lo"). I need only to divide it by a 32-bit number. How could I do it, using the native 64-bit operations from CPU?

(Please, note that I DO NOT need an arbitrary precision library. Just need to know how to make this simple division using native operations. Thank you).

phuclv
  • 37,963
  • 15
  • 156
  • 475
rookie
  • 205
  • 1
  • 3
  • 6

4 Answers4

4

The subtitle to volume two of The Art of Computer Programming is Seminumerical Algorithms. It's appropriate, as the solution is fairly straight-forward when you think of the number as an equation instead of as a number.

Think of the number as Hx + L, where x is 264. If we are dividing by, call it Y, it is then trivially true that Hx = (N + M)x where N is divisible by Y and M is less than Y. Why would I do this? (Hx + L) / Y can now be expressed as (N / Y)x + (Mx + L) / Y. The values N, N / Y, and M are integers: N is just H / Y and M is H % Y However, as x is 264, this still brings us to a 128 by something divide, which will raise a hardware fault (as people have noted) should Y be 1.

So, what you can do is reformulate the problem as (Ax3 + Bx2 + Cx + D) / Y, with x being 232. You can now go down: (A / Y)x3 + (((A % Y)x + B) / Y)x2 + (((((A % Y)x + B) % Y)x + C) / Y)x + ((((((A % Y)x + B) % Y)x + C) / Y)x + D) / Y. If you only have 64 bit divides: you do four divides, and in the first three, you take the remainder and shift it up 32 bits and or in the next coefficient for the next division.

This is the math behind the solution that has already been given twice.

phuclv
  • 37,963
  • 15
  • 156
  • 475
Thomas
  • 41
  • 1
3

If you are storing the value (128-bits) using the largest possible native representation your architecture can handle (64-bits) you will have problems handling the intermediate results of the division (as you already found :) ).

But you always can use a SMALLER representation. What about FOUR numbers of 32-bits? This way you could use the native 64-bits operations without overflow problems.

A simple implementation (in Delphi) can be found here.

F.D.Castel
  • 2,284
  • 1
  • 16
  • 15
  • First, link to external code is invalid. And you don't need to use handle it as four 32-bit numbers on a 64-bit machine. If they have native division instructions they'll have a way to do division of a 128-bit number by a 64-bit number. Your delphi code also won't compile to native division instructions like the OP wants – phuclv Feb 01 '20 at 03:13
3

I have a DECIMAL structure which consists of three 32-bit values: Lo32, Mid32 and Hi32 = 96 bit totally.

You can easily expand my C code for 128-bit, 256-bit, 512-bit or even 1024-bit division.

// in-place divide Dividend / Divisor including previous rest and returning new rest
static void Divide32(DWORD* pu32_Dividend, DWORD u32_Divisor, DWORD* pu32_Rest)
{
    ULONGLONG u64_Dividend = *pu32_Rest;
    u64_Dividend <<= 32;
    u64_Dividend |= *pu32_Dividend;

    *pu32_Dividend = (DWORD)(u64_Dividend / u32_Divisor);
    *pu32_Rest     = (DWORD)(u64_Dividend % u32_Divisor);
}

// in-place divide 96 bit DECIMAL structure
static bool DivideByDword(DECIMAL* pk_Decimal, DWORD u32_Divisor)
{
    if (u32_Divisor == 0)
        return false;

    if (u32_Divisor > 1)
    {
        DWORD u32_Rest = 0;
        Divide32(&pk_Decimal->Hi32,  u32_Divisor, &u32_Rest); // Hi FIRST!
        Divide32(&pk_Decimal->Mid32, u32_Divisor, &u32_Rest);
        Divide32(&pk_Decimal->Lo32,  u32_Divisor, &u32_Rest);
    }
    return true;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Elmue
  • 7,602
  • 3
  • 47
  • 57
  • this won't compile to native CPU division instructions like the OP wants – phuclv Feb 01 '20 at 03:10
  • He does not talk about any specific compiler. Division and modulo with ULONGLONG == unsigned __int64 is calculated directly on a 64 bit CPU in one instruction. Study the assembler output and you will see. – Elmue Feb 01 '20 at 13:28
  • yes and your code compiles to a long series of instructions instead of just a single `div` instruction in the corresponding 64-bit architecture. Your code is also far from being portable, since there's no DWORD or ULONGLONG in the standard – phuclv Feb 01 '20 at 13:38
  • see the difference yourself https://godbolt.org/z/k2cHmC of course 64-bit division and modulo is done in a single instruction, but you're doing **a lot** of those divisions instead of only one (as you can already directly divide a 128-bit number by a 64-bit number in 64-bit architectures) – phuclv Feb 01 '20 at 14:02
  • 2
    OK. You are right. By the way: C or C++ is never portable. I tried to port Linux projects to Visual Studio. It is always a nightmare. My code was not thought to be copied and pasted. I just showed a simple way how to divide numbers of ANY size. Even much larger than 128 bit. May be my answer is useful for others who come here by googling "128 bit division" and do not care if the code is instrinsic or not. By the way: Your solution with _div128() requies VS 2019. My code works everywhere. – Elmue Feb 01 '20 at 18:30
  • C and C++ are both portable, provided that the code was written with portability right from the beginning. Bad code that assumes Unix or Windows platforms only will obviously be a pain later to port. In fact C++ is more portable nowadays since almost everything is available in the std library, and if it's not there it'll be available in boost. For example there's std::filesystem to operate on files and directories so there's no reason to use platform APIs to delete/move/truncate... a file – phuclv Feb 02 '20 at 00:36
  • This answer helped me bigtime!! Especially because it is fully scaleable. – poby Jun 28 '20 at 03:46
3

How could I do it, using the native 64-bit operations from CPU?

Since you want native operations, you'll have to use some built-in types or intrinsic functions. All the above answers will only give you general C solutions which won't be compiled to the division instruction

Most modern 64-bit compilers have some ways to do a 128-by-64 division. In MSVC use _div128() and _udiv128() so you'll just need to call _udiv128(hi, lo, divisor, &remainder)

The _div128 intrinsic divides a 128-bit integer by a 64-bit integer. The return value holds the quotient, and the intrinsic returns the remainder through a pointer parameter. _div128 is Microsoft specific.

In Clang, GCC and ICC there's an __int128 type and you can use that directly

unsigned __int128 div128by32(unsigned __int128 x, uint64_t y)
{
    return x/y;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • 3
    Note that `_udiv128` will cause integer overflow error if the result won't fit into 64 bits. – poby Jun 28 '20 at 03:51