Two's Complement Division Algorithm (Intel)

Question

On Intel machines, integer division requires a sequence of two instructions:

After you store the dividend in the a register (e.g., %eax), you extend the sign of the dividend into the corresponding d register (%edx).
The idiv or div instruction itself then performs the division and places the quotient in the a register and the remainder in the d register. But why does the operation require the sign extension? What does the algorithm actually do with that sign extension value? How does the remainder wind up in that same space?

I am not asking how to do integer division on Intel or what happens if you do not follow the rules. I want to know why these are the rules. I want to know how the algorithm works, such that it must use the sign-extended register space somehow.

Let's take a manageable example with 8-bit operands—decimal -65 / 3.

         -------------------
00000011 ) 11111111 10111111

If it were 65/3 (with a positive dividend), I could see how the left padding would provide room for the subtraction—

                    00010101
         -----------------------
00000011 ) 00000000 01000001
               0000 0011
               ---------
               0000 000100
                 00 000011
                 ---------
                 00 00000101
                    00000011
                    --------
                    00000010  (R)

(I abbreviated in the above by not showing instances of subtracting 0. Also, I showed the subtractions as such rather than as additions to two's-complement negations of the divisor, but the basic point would remain the same.) Here, the 0 padding makes room to subtract each bit. However, I do not see how this would work when the dividend is a two's-complement negative integer. And, even in this positive case, it's not obvious why the remainder should wind up in the space space that had held the padding, except the mere oonvenience of its already being available.

Thank you!

You don't *have to* sign extend, but just be aware that `idiv` treats `edx:eax` as a 64-bit signed number. If you zero-extended, then calculating `-1 / 2` would give you a quotient of `0x7fffffff` because `0x00000000ffffffff / 2 = 0x7fffffff` with a remainder of 1. — Raymond Chen, Sep 26 '21 at 18:25

Yakov Galka · Answer 1 · 2021-09-27T00:28:31.363

5

Those two steps are for dividing a 32-bit signed value by another 32-bit signed value. The idiv instruction divides a 64-bit signed integer by a 32-bit integer. Therefore if all you have is a 32-bit integer, you need to convert it to a 64-bit integer first.

Why idiv was designed to do a 64-bit by 32-bit division? For the same reason mul was designed to produce a 64-bit result from 32-bit by 32-bit multiplication: so that you can implement arbitrary precision arithmetics working on 32-bit groups using those primitives.

EDIT: Why is it needed for arbitrary precision arithmetics? Let's walk through a decimal long-division problem:

512 / 7 = ?

For the first digit of the result we need to figure out how many times 7 fits in 51; we calculate truncate(51/7) = 7, so the next digit will be 7, which we multiply, shift, and subtract to get:

512 / 7 = 7?
490
---
 22

We repeat that for truncate(22/7) = 3, giving us:

512 / 7 = 73
490
---
 22
 21
---
  1   (remainder)

Notice that we were using an 'oracle' to divide a two digit number by a one digit number; and used it repeatedly on different two-digit groups to produce the digits of the result one-by-one. Similarly, if your 'digits' are 32-bit words, you need an oracle that divides 64-bit number by a 32-bit number.

edited Sep 27 '21 at 00:28

answered Sep 26 '21 at 18:25

Yakov Galka

70,775
16
139
220

2

related: [Why should EDX be 0 before using the DIV instruction?](https://stackoverflow.com/q/38416593) explains how to actually do extended-precision unsigned division 64-bit or more division by a 32-bit divisor, using the EDX remainder of a high chunk as the high half of the dividend for the next chunk down. – Peter Cordes Sep 26 '21 at 18:36
1

@PeterCordes I'd also point to D. Knuth TAOCP Vol 2 Seminumerical Algorithms for an in-depth analysis of arbitrary precision arithmetics. – Yakov Galka Sep 26 '21 at 18:49
OK, thanks. So the remainder winds up in the d register just as a matter of convenience, reusing space no longer needed. – Amittai Aviram Sep 26 '21 at 23:33
But thank you for the Knuth reference. – Amittai Aviram Sep 26 '21 at 23:41
@AmittaiAviram: No, I mean where each chunk (aka limb) represents a group of 32 bits of a single integer, too large for x86 to store in a single integer register. So you handle it just like you do on paper for multiple base-10 digits. When you're doing that, you *don't* zero or sign extend EAX into EDX, only for the top limb. – Peter Cordes Sep 26 '21 at 23:54
@AmittaiAviram: The term "arbitrary precision" is maybe kind of a misnomer for integer arithmetic, where the "precision" is always 1. "Arbitrary size integer arithmetic" might be a better term. – Nate Eldredge Sep 27 '21 at 01:09
@PeterCordes — OK, yes, I understand. Thanks a lot! – Amittai Aviram Sep 27 '21 at 01:29
@NateEldredge Yes! Saying that it is a bit of a misnomer is quite helpful. As the example to which Peter Cordes linked shows, the idea is rather to allow for an arbitrarily large (or wide) dividend, by breaking the division down into iterative steps, at each step of which, after the first, the previous remainder becomes the higher-order bits of the current dividend. [Corrected: "divisor" -> dividend.] – Amittai Aviram Sep 27 '21 at 01:50

Two's Complement Division Algorithm (Intel)

1 Answers1