Signed integer division of 1024-bit numbers using 32-bit unsigned segments

Question

I recently asked a question about 1024-bit signed multiplication. Now I want to implement integer division using the same number format.

typedef struct int1024
{
    uint32_t num[32];
} int1024;

For multiplication I used x86 inline asm using the mulx, add, mov, and adc instructions without any branching. Is there any integer division algorithm that performs integer division using a number that is segmented up in 32-bit blocks without branching? (as an idea, maybe one that uses the division instruction itself)? (If possible a 128-bit code example would be great if it exists, in order to avoid the sheer amount of instructions for 1024-bit numbers, thanks for your time)

Every package that does arbitrary-precision math will contain such an algorithm, and there are lots of open-source ones out there. — Nate Eldredge, Mar 28 '21 at 05:07
As a start, you can think about the long division algorithm you hopefully learned in grade school (replacing powers of 10 with powers of 2^32), or the polynomial division algorithm you hopefully learned in high school (taking x=2^32). — Nate Eldredge, Mar 28 '21 at 05:11
Reduce signed division to unsigned division. Taking advantage of existing code for fast multiplication (using Karatsuba for example), use a starting approximation and repeated Halley iteration with cubic convergence to compute the reciprocal of the divisor. Multiply this with the dividend to get a preliminary quotient. Compute the remainder to determine any adjustments to the quotient that may be needed. See [this question](https://stackoverflow.com/q/36853827/780717) for a worked example of this flow for 64-bit division. — njuffa, Mar 28 '21 at 05:15
Consider using, if so allowed, the [GNU GMPlib](https://gmplib.org/) library, but take time to read its documentation. It is [free software](https://www.gnu.org/philosophy/free-sw.en.html), and you are allowed to study and improve its source code — Basile Starynkevitch, Mar 28 '21 at 06:30
@BasileStarynkevitch I am doing this as a self exercise in learning and exploring extended arithmetic assembly/C. I would love to try to implement this myself if possible. But thanks for the suggestion, I will take a look! — yosmo78, Mar 28 '21 at 06:32
Then borrow books in your university library about [arbitrary precision arithmetic](https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic). Read ACM conference papers about them. It is a difficult research topic, and you can make your PhD in designing and implementing a good bignum library. So choose a good PhD advisor. A possible approach could be metaprogramming. Email me to `basile@starynkevitch.net` for more, and mention the URL of your question — Basile Starynkevitch, Mar 28 '21 at 06:33
@BasileStarynkevitch I didn't know it was so difficult, but then again I am already asking for help lol. Thanks for the advice on looking for ACM conference papers as well. I will try to see if any good ones are out there. The difficulty then will make this more fun to try :) — yosmo78, Mar 28 '21 at 06:37
For a large dividend but a single-register divisor, x86's `div` instruction is designed for exactly that kind of use case, where you use the remainder from a high "limb" as the high half of the dividend for the next least significant limb. [Why should EDX be 0 before using the DIV instruction?](https://stackoverflow.com/a/38416896) shows an extended-precision DIV loop as an example use-case for DIV with EDX != 0. But unlike multiply (where you can build a 2N by 2N multiply out of four N-bit multiplies), you can't easily do a larger divisor in pieces. — Peter Cordes, Mar 28 '21 at 15:26
For an example of 128-bit division, see gcc's libgcc helper function for `unsigned __int128`. (e.g. LLVM's version here in C: https://code.woboq.org/llvm/compiler-rt/lib/builtins/udivmodti4.c.html#__udivmodti4) Also related: https://danlark.org/2020/06/14/128-bit-division/ but I think they're optimizing for the 128 / 64 => 64 case where it fits, avoiding calling `__udivti3`. — Peter Cordes, Mar 28 '21 at 15:32
IIRC, in the general case where the divisor isn't a single register-width, it's more typical to manually do division by shift and subtract, not using the `div` instruction at all. — Peter Cordes, May 03 '22 at 03:38

Signed integer division of 1024-bit numbers using 32-bit unsigned segments

0 Answers0