My processor, a small 16-bit microcontroller with no FPU and integer math only has 16/16 division and 32/16 division which both take 18 cycles. At the moment I'm using a very slow software routine (~7,500 cycles) to do 64/32 division. Is there any way to use these division engines to calculate a 64/32 division? Similar to how I'm already using the 16x16 multiplier and adder to calculate 32x32 multiplies? I'm using C but can work with any general explanation on how it can be done... I'm hoping to target <200 cycles (if it's at all possible.)
-
what language is that? In most (if not all) languages, double/single works with the FPU and it's quite fast... unless I'm missing something here – Yanick Rochon Jan 23 '11 at 02:12
-
methinks he is talking about integer division, not floating point division – Jason S Jan 23 '11 at 02:13
-
Are we talking about some specific language (C, asm)? Does the machine have FPU or does it only operate on integer registers? – mike.dld Jan 23 '11 at 02:13
-
@mike.did It's a microcontroller, and we're dealing with integer. No FPU. – Thomas O Jan 23 '11 at 02:15
-
@Yanick I'm using integer division. – Thomas O Jan 23 '11 at 02:15
-
@Yanick Signed preferred. I can work with an unsigned routine by comparing and then clearing the signs before and setting the correct sign afterwards, though. – Thomas O Jan 23 '11 at 02:20
-
Please post your algorithm. We may be able to improve it. I should say 7,500 is too large. Maybe more like 64 * 32. – Joshua Jan 23 '11 at 02:22
-
@Joshua It is the standard GCC algorithm. _udivsi3 I think. – Thomas O Jan 23 '11 at 02:28
-
1Have you tried googling for *computer division algorithm*? There are plenty of them. – ruslik Jan 23 '11 at 02:39
-
@ruslik I think they all depend on the basic algorithm GCC uses which is about as fast as you can get with that kind of algorithm. I'd prefer one which takes advantage of hardware already in place, and that's not very common, if it even exists. – Thomas O Jan 23 '11 at 02:43
-
1The quotient of a 64/32 division won't always fit in 32 bits. – dan04 Jan 23 '11 at 02:58
-
@dan04 The result is 64 bits I think but I only care about the first 32 bits. – Thomas O Jan 23 '11 at 03:00
-
1@Thomas O: Are you sure the result is going to be 64 bit? Just been reading the [reference manual](http://ww1.microchip.com/downloads/en/DeviceDoc/70157D.pdf) and it says a 32/16 div with >16 bits in the result will cause unexpected results. I would assume any simple implementation of 64/32 div that uses that instruction will encounter similar problems with a >32 bit result. – Nemo157 Jan 23 '11 at 03:43
-
I recall getting the lower number using processor specific assembly. It really depends in all cases on what obscure primitives allow us to do much better than the compiler can. A pure MIPS can't do better than the C code. – Joshua Jan 23 '11 at 17:36
-
FYI, if you're actually after 64/16 there is a really fast answer. – Joshua Jan 23 '11 at 17:37
-
@Joshua I'm interested, with 64/16 is it possible to implement 64/32? – Thomas O Jan 23 '11 at 17:53
-
[How can I perform 64-bit division with a 32-bit divide instruction?](https://stackoverflow.com/q/3572409/995714) – phuclv May 23 '17 at 08:58
4 Answers
See "Hacker's Delight", multiword division (pages 140-145).
The basic concept (going back to Knuth) is to think of your problem in base-65536 terms. Then you have a 4 digit by 2 digit division problem, with 2/1 digit division as a primitive.
The C code is here: https://github.com/hcs0/Hackers-Delight/blob/master/divmnu.c.txt

- 13,833
- 5
- 42
- 49
My copy of Knuth (The Art of Computer Programming) is at work, so I can't check it until Monday, but that would be my first source. It has a whole section on arithmetic.
edit: your post about "16/16 division and 32/16 division which both take 18 cycles." -- dsPICs have a conditional subtract operation in assembly. Consider using this as your computational primitive.
Also note that if X = XH * 232 + XL and D = DH * 216 + DL, then if you are looking for
(Q,R) = X/D where X = Q * D + R
where Q = QH * 216 + QL, R = RH * 216 + RL, then
XH * 232 + XL = DH * QH * 232 + (DL * QH + DH * QL) * 216 + (DL * QL) + RH * 216 + RL
This suggests (by looking at terms that are the high 32 bits) to use the following procedure, akin to long division:
- (QH, R0) = XH / (DH+1) -> XH = QH * (DH+1) + R0 [32/16 divide]
- R1 = X - (QH * 216) * D [requires a 16*32 multiply, a shift-left by 16, and a 64-bit subtract]
- calculate R1' = R1 - D * 216
- while R1' >= 0, adjust QH upwards by 1, set R1 = R1', and goto step 3
- (QL, R2) = (R1 >> 16) / (DH+1) -> R1 = QL * (DH+1) + R2 [32/16 divide]
- R3 = R1 - (QL * D) [requires a 16*32 multiply and a 48-bit subtract]
- calculate R3' = R3 - D
- while R3' >= 0, adjust QL upwards by 1, set R3 = R3', and goto step 7
Your 32-bit quotient is the pair (QH,QL), and 32-bit remainder is R3.
(This assumes that the quotient is not larger than 32-bit, which you need to know ahead of time, and can easily check ahead of time.)

- 184,598
- 164
- 608
- 970
-
Thanks for this algorithm. I will have to think about how to implement it in C. – Thomas O Jan 23 '11 at 20:28
-
This works, with some caveats. First, it's 64/32=32 bit division, not 64/32=64, therefore the case `XH>=D` should be rejected as overflow (which catches also divisions by zero). Second, the case `DH=0xFFFF` produces an overflow on `DH+1` (steps 1 and 5), which is no longer 16 bits long and the built-in division can't be used (but a shift can be used instead if treated as a special case). Third, the number of iterations of the loop can sometimes be in the thousands, which is not practical. I believe the latter is solved with the normalization step in Knuth's algorithm. – Pedro Gimeno Aug 20 '21 at 20:17
Starting point would be: D. Knuth, The Art of Computer Programming Vol.2, Section 4.3.1, Algorithm D
But I suppose you may need to optimize the algorithm.

- 11,796
- 3
- 53
- 63
You may want to look at Booth's Algorithm
(http://www.scribd.com/doc/3132888/Booths-Algorithm-Multiplication-Division).
The part you want is about 1/2 way down the page.
I haven't looked at this since my VLSI class, but, this may be your best bet, if possible you may want to do this in assembly, to optimize it as much as possible, if you will be calling this often.
Basically involves shifting and adding or subtracting.

- 41,583
- 10
- 86
- 166
-
@Jason S - If you look at the article, it can also be used for division. – James Black Jan 23 '11 at 15:37