How to use fast division operation without support of division instruction, for a constant divisor?

Question

How do we divide a/b without using division operation? As my hardware IP does not support division instruction, we use library function to do the division and hence takes a lot of cycles. If I want this division in a fast data path, is there a way to do this using some scaled multiplication operation.

Note both and a and b are unsigned 32-bit integers. And we are not interested in the floating-point operation, hence we can ignore the decimal part of the division.

Note b here have some fixed values which are pre-known to us.

Repeatedly subtract `b` from `a` until the result is negative. Count how many times you subtracted before going negative. Example: `22 / 5` would go `22, 17, 12, 7, 2, -3` so that's 4 subtractions before the negative result. 22/5 = 4 (and a bit). — Niet the Dark Absol, Mar 22 '20 at 19:33
Yes, if you have efficient multiplication, see Montgomery & Granlund's paper: http://gmplib.org/~tege/divcnst-pldi94.pdf. Details on how it works can be found in that GCC Q&A. (And you can compile for x86 or ARM to get GCC to calculate the fixed-point multiplicative inverse and required shifts for the high half for you.) Or if you don't need it to be exact for *every* possible `a`, you can use a simpler inverse that just takes the high half of the multiply. — Peter Cordes, Mar 22 '20 at 23:48
@NiettheDarkAbsol: That would *eventually* give the right answer but is presumably even slower than the OP's library function for divisions that produce large quotients. It's O(quotient) time, instead of O(log2(quotient)) for an implementation that shifts to do binary long-division. — Peter Cordes, Mar 22 '20 at 23:49
On 2nd look, did you really mean that each callsite has a fixed `b`, or just that for each division `b` could be one of a few values? So you might need to use libdivide style handling of looking up multipliers and shift counts from a table. Please re-edit if my last edit was wrong. — Peter Cordes, Mar 23 '20 at 00:39
@PeterCordes: Yes we have fixed values of b. In other words 'b' in my platform refers to the operating frequency(MHz) of the core which is pre-known. Thanks for the suggestion for lookup of multipliers and doing shift on the res. What I am doing is (((1/b)*( 1 << 31) * a)>> 31. Note as mentioned I am forming the lookup table for the multiplier which is equal to (1/b) *(1 << 31). — Aniruddha Paul, Mar 25 '20 at 16:37
You don't need a lookup table if you can rebuild the source for a known value of `b`. But anyway, your formula is probably not exact for every value of `a`. That might be ok for you, going even faster than with an exact method that might require more shifting and adding for more complicated `b` values. — Peter Cordes, Mar 26 '20 at 02:20

How to use fast division operation without support of division instruction, for a constant divisor?

0 Answers0