61

Is it possible to divide an unsigned integer by 10 by using pure bit shifts, addition, subtraction and maybe multiply? Using a processor with very limited resources and slow divide.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Thomas O
  • 6,026
  • 12
  • 42
  • 60
  • It's possible (repeated subtraction is division), but the question is whether it's any faster than the slow division. – David Thornley Apr 05 '11 at 21:07
  • 1
    @esnyder. Sorry, I can't understand you. Are you talking in base 17 or base 22? – Thomas O Apr 05 '11 at 21:39
  • Base large two. Shifting right divides by 2^n which would solve your question if by "10" you mean 16 decimal or 10h. – tamarintech Apr 05 '11 at 21:42
  • Are you arguing with me? I'm actually trying to admit that *I* failed to mention my answer was not for decimal.... Might be a bit obscure, but that was my intention. – tamarintech Apr 05 '11 at 21:57
  • @Thomas O - see my comment. I don't notice an upvote.... – KevinDTimm Apr 05 '11 at 22:26
  • 1
    @esynder, Yes, I guess I was arguing with you, over the interpretation of 10(base 10) as 10(base 16). I think such an interpretation by default is unusual, at best. – Thomas O Apr 05 '11 at 23:01
  • Related: [Why does GCC use multiplication by a strange number in implementing integer division?](https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi): If you have a fast multiply, you can divide by compile-time constants with just a multiply and a shift of the high half, getting the correct result for every dividend (unlike the current accepted answer). – Peter Cordes Oct 18 '17 at 21:24

9 Answers9

71

Editor's note: this is not actually what compilers do, and gives the wrong answer for large positive integers ending with 9, starting with div10(1073741829) = 107374183 not 107374182 (Godbolt). It is exact for inputs smaller than 0x40000005, though, which may be sufficient for some uses.

Compilers (including MSVC) do use fixed-point multiplicative inverses for constant divisors, but they use a different magic constant and shift on the high-half result to get an exact result for all possible inputs, matching what the C abstract machine requires. See Granlund & Montgomery's paper on the algorithm.

See Why does GCC use multiplication by a strange number in implementing integer division? for examples of the actual x86 asm gcc, clang, MSVC, ICC, and other modern compilers make.


This is a fast approximation that's inexact for large inputs

It's even faster than the exact division via multiply + right-shift that compilers use.

You can use the high half of a multiply result for divisions by small integral constants. Assume a 32-bit machine (code can be adjusted accordingly):

int32_t div10(int32_t dividend)
{
    int64_t invDivisor = 0x1999999A;
    return (int32_t) ((invDivisor * dividend) >> 32);
}

What's going here is that we're multiplying by a close approximation of 1/10 * 2^32 and then removing the 2^32. This approach can be adapted to different divisors and different bit widths.

This works great for the ia32 architecture, since its IMUL instruction will put the 64-bit product into edx:eax, and the edx value will be the wanted value. Viz (assuming dividend is passed in ecx (fastcall) and quotient returned in eax)

div10 proc 
    mov    eax,1999999Ah    ; 1/10 * 2^32
    imul   ecx              ; edx:eax = dividend / 10 * 2 ^32
    mov    eax,edx          ; eax = dividend / 10
    ret
endp

Even on a machine with a slow multiply instruction, this will be faster than a software or even hardware divide.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
John Källén
  • 7,551
  • 31
  • 64
  • 20
    +1, and I'd like to emphasise that the compiler will do this for you automatically when you write "x/10" – Theran Apr 05 '11 at 21:25
  • 1
    hmm, isn't there some numerical inaccuracy here? – Jason S Apr 06 '11 at 12:49
  • 3
    You're always going to have numerical inaccuracy when doing integer divides: What do you you get when you divide 28 by 10 using integers? Answer: 2. – John Källén Apr 06 '11 at 12:52
  • 5
    There is no numerical inaccuracy in integer division, the result is exactly specified. However, the formula above is only exact for certain divisors. Even 10 is inaccurate if you want to do unsigned arithmetic: `4294967219 / 10 = 429496721`, but `4294967219 * div >> 32 = 429496722` For larger divisors, the signed version will be inaccurate as well. – Evan Jul 03 '17 at 22:12
  • 1
    @Theran: No, compilers including MSVC will compile `x/10` to [a fixed-point multiplicative inverse](https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi) (and make extra code to handle negative inputs for signed division) to give the correct answer for all possible 32-bit inputs. For unsigned division by 10, MSVC (and other compilers) (https://godbolt.org/g/aAq7jx) will multiply by `0xcccccccd` and right-shift the high half by 3. – Peter Cordes Oct 18 '17 at 21:07
  • I wrote [a test program for this](https://godbolt.org/g/Kd9Q4A), comparing the results against `i/10`. It's wrong for large positive integers ending with 9, starting with `div10(1073741829) = 107374183. Correct = 107374182`. It's also wrong for most (all?) negative integers, e.g. `div10(-1) = -1. Correct = 0`. @JasonS is correct that this doesn't implement the C semantics of `x / 10`. – Peter Cordes Oct 18 '17 at 21:41
  • @PeterCordes there's a neat video by Matt Godbolt that touches upon what the compiler does for division; sometimes it uses multiplication. see https://www.youtube.com/watch?v=bSkpMdDe4g4 – Jason S Oct 18 '17 at 22:55
46

Though the answers given so far match the actual question, they do not match the title. So here's a solution heavily inspired by Hacker's Delight that really uses only bit shifts.

unsigned divu10(unsigned n) {
    unsigned q, r;
    q = (n >> 1) + (n >> 2);
    q = q + (q >> 4);
    q = q + (q >> 8);
    q = q + (q >> 16);
    q = q >> 3;
    r = n - (((q << 2) + q) << 1);
    return q + (r > 9);
}

I think that this is the best solution for architectures that lack a multiply instruction.

realtime
  • 700
  • 5
  • 9
  • pdf not available anymore – Alexis May 27 '21 at 00:33
  • how can we adapt it for 10^N? – Alexis May 27 '21 at 00:46
  • 1
    The original site is dead, the link points now to the archived version in the Wayback Machine. In the linked PDF you will find code for division by 100 and 1000. Please be aware that these still contain a multiply operation which would need to replaced with shifts and adds. Also, the divu100 and divu1000 code contains many shifts which are not a multiple of 8, so if you are on an architecture which has neither a barrel shifter nor a muliply instruction, you might be better off by applying divu10 repeatedly. – realtime May 28 '21 at 06:04
  • Thank you! It's for FPGA/RTL, I will adapt depending on the timing I can get. I just found the link to this pdf literally everywhere such a question is asked. Without being able to find the actual file. Thanks again! – Alexis May 28 '21 at 06:24
  • Often architectures that lack MUL also lack support for bit shifting more than one bit at a time, like AVR 8 bit, where this results in a mountain of loops for the various bit shifts – thomasrutter Aug 23 '21 at 13:28
  • You could save an op with `q = n - (n >> 2);` – David Eisenstat Dec 18 '21 at 13:56
21

Of course you can if you can live with some loss in precision. If you know the value range of your input values you can come up with a bit shift and a multiplication which is exact. Some examples how you can divide by 10, 60, ... like it is described in this blog to format time the fastest way possible.

temp = (ms * 205) >> 11;  // 205/2048 is nearly the same as /10
Caerulius
  • 425
  • 4
  • 21
Alois Kraus
  • 13,229
  • 1
  • 38
  • 64
7

to expand Alois's answer a bit, we can expand the suggested y = (x * 205) >> 11 for a few more multiples/shifts:

y = (ms *        1) >>  3 // first error 8
y = (ms *        2) >>  4 // 8
y = (ms *        4) >>  5 // 8
y = (ms *        7) >>  6 // 19
y = (ms *       13) >>  7 // 69
y = (ms *       26) >>  8 // 69
y = (ms *       52) >>  9 // 69
y = (ms *      103) >> 10 // 179
y = (ms *      205) >> 11 // 1029
y = (ms *      410) >> 12 // 1029
y = (ms *      820) >> 13 // 1029
y = (ms *     1639) >> 14 // 2739
y = (ms *     3277) >> 15 // 16389
y = (ms *     6554) >> 16 // 16389
y = (ms *    13108) >> 17 // 16389
y = (ms *    26215) >> 18 // 43699
y = (ms *    52429) >> 19 // 262149
y = (ms *   104858) >> 20 // 262149
y = (ms *   209716) >> 21 // 262149
y = (ms *   419431) >> 22 // 699059
y = (ms *   838861) >> 23 // 4194309
y = (ms *  1677722) >> 24 // 4194309
y = (ms *  3355444) >> 25 // 4194309
y = (ms *  6710887) >> 26 // 11184819
y = (ms * 13421773) >> 27 // 67108869

each line is a single, independent, calculation, and you'll see your first "error"/incorrect result at the value shown in the comment. you're generally better off taking the smallest shift for a given error value as this will minimise the extra bits needed to store the intermediate value in the calculation, e.g. (x * 13) >> 7 is "better" than (x * 52) >> 9 as it needs two less bits of overhead, while both start to give wrong answers above 68.

if you want to calculate more of these, the following (Python) code can be used:

def mul_from_shift(shift):
    mid = 2**shift + 5.
    return int(round(mid / 10.))

and I did the obvious thing for calculating when this approximation starts to go wrong with:

def first_err(mul, shift):
    i = 1
    while True:
        y = (i * mul) >> shift
        if y != i // 10:
            return i
        i += 1

(note that // is used for "integer" division, i.e. it truncates/rounds towards zero)

the reason for the "3/1" pattern in errors (i.e. 8 repeats 3 times followed by 9) seems to be due to the change in bases, i.e. log2(10) is ~3.32. if we plot the errors we get the following:

errors

where the relative error is given by: mul_from_shift(shift) / (1<<shift) - 0.1

Sam Mason
  • 15,216
  • 1
  • 41
  • 60
4

Considering Kuba Ober’s response, there is another one in the same vein.
It uses iterative approximation of the result, but I wouldn’t expect any surprising performances.

Let say we have to find x where x = v / 10.

We’ll use the inverse operation v = x * 10 because it has the nice property that when x = a + b, then x * 10 = a * 10 + b * 10.

Let use x as variable holding the best approximation of result so far. When the search ends, x Will hold the result. We’ll set each bit b of x from the most significant to the less significant, one by one, end compare (x + b) * 10 with v. If its smaller or equal to v, then the bit b is set in x. To test the next bit, we simply shift b one position to the right (divide by two).

We can avoid the multiplication by 10 by holding x * 10 and b * 10 in other variables.

This yields the following algorithm to divide v by 10.

uin16_t x = 0, x10 = 0, b = 0x1000, b10 = 0xA000;
while (b != 0) {
    uint16_t t = x10 + b10;
    if (t <= v) {
        x10 = t;
        x |= b;
    }
    b10 >>= 1;
    b >>= 1;
}
// x = v / 10

Edit: to get the algorithm of Kuba Ober which avoids the need of variable x10 , we can subtract b10 from v and v10 instead. In this case x10 isn’t needed anymore. The algorithm becomes

uin16_t x = 0, b = 0x1000, b10 = 0xA000;
while (b != 0) {
    if (b10 <= v) {
        v -= b10;
        x |= b;
    }
    b10 >>= 1;
    b >>= 1;
}
// x = v / 10

The loop may be unrolled and the different values of b and b10 may be precomputed as constants.

greybeard
  • 2,249
  • 8
  • 30
  • 66
chmike
  • 20,922
  • 21
  • 83
  • 106
  • 1
    Er… this is just long division (yeah, that thing you learnt at primary school) for binary rather than decimal. – liyang Mar 05 '21 at 02:11
  • I don't know what you call long division. What I'm sure is that I didn't learn that at school. What I learn at school is a different method. – chmike Mar 05 '21 at 05:28
  • I mean https://en.wikipedia.org/wiki/Long_division#Method , but where the method asks you to “obtain the largest whole number that is a multiple of the divisor”, just keep in mind that the multiple can only be 1 or 0 when working in base-2. Your test for `b10 <= v` is just checking if said multiple is 1. In any case, this is how I taught long division for a Computer Systems Architecture course some years ago. What method of decimal long division did you learn at school? – liyang Mar 16 '21 at 12:28
  • 1
    As a side note, it's objectively _easier_ than decimal long division, as you would never ask yourself e.g. “how many times does 3 divide 8?”—in base-2, it either does exactly once with no remainder, or it doesn't at all. The only thing making this less intuitive is our relative familiarity with base-10, in contrast to working in base-2. – liyang Mar 16 '21 at 12:38
3

On architectures that can only shift one place at a time, a series of explicit comparisons against decreasing powers of two multiplied by 10 might work better than the solution form hacker's delight. Assuming a 16 bit dividend:

uint16_t div10(uint16_t dividend) {
  uint16_t quotient = 0;
  #define div10_step(n) \
    do { if (dividend >= (n*10)) { quotient += n; dividend -= n*10; } } while (0)
  div10_step(0x1000);
  div10_step(0x0800);
  div10_step(0x0400);
  div10_step(0x0200);
  div10_step(0x0100);
  div10_step(0x0080);
  div10_step(0x0040);
  div10_step(0x0020);
  div10_step(0x0010);
  div10_step(0x0008);
  div10_step(0x0004);
  div10_step(0x0002);
  div10_step(0x0001);
  #undef div10_step
  if (dividend >= 5) ++quotient; // round the result (optional)
  return quotient;
}
phuclv
  • 37,963
  • 15
  • 156
  • 475
Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • Your code performs 16 multiplication by 10. Why do you think your code is faster than hacker’s delight ? – chmike Oct 07 '17 at 20:11
  • It doesn't matter what I think. What matters is whether at the applicable platform it is faster. Try yourself! There's no universally fastest solution here at all. Every solution has some platform in mind, and will work best on that platform, possibly better than any other solution. – Kuba hasn't forgotten Monica Oct 18 '17 at 16:09
  • I didn’t notice that n*10 is constant. It will thus be precomputed by the compiler. I provided an alternate algorithm in an answer. Our algorithm are equivalent except for one difference. You subtract b*10 from v and I add it to x*10. Your algorithm doesn’t need to keep track of x*10 which saves a variable. The code you show unrolls the my while loop. – chmike Oct 18 '17 at 19:10
  • 1
    @chmike: On a machine without hardware multiply, `n*10` is still cheap: `(n<<3) + (n<<1)`. These small-shift answers could maybe be useful on machines with slow or non-existent HW multiply, and only a shift by 1. Otherwise a fixed-point inverse is much better for compile-time constant divisors (like modern compilers do for `x/10`). – Peter Cordes Oct 18 '17 at 21:17
  • 3
    This is an awesome solution, especially useful for processors that do not have right shift (e.g. LC-3). – Erik Eidt Feb 28 '20 at 02:00
  • @ErikEidt: njuffa's answer on [How can I multiply and divide using only bit shifting and adding?](https://stackoverflow.com/a/32443307) doesn't need a right-shift either, only an `adc` or unsigned-compare for manual carry detection, and for the actual conditional. – Peter Cordes Dec 02 '20 at 11:44
  • This is an awesome solution, thank you so much. And I have found if you turn all or some of those ifs into whiles, you can take out some of the intermediate steps and it can still be quite efficient leaving every 4th step or so. Saves a lot of code space for not much increase in cycles. Edit: this is on AVR, where bit shifts by more than one are expensive – thomasrutter Aug 23 '21 at 13:32
2

Well division is subtraction, so yes. Shift right by 1 (divide by 2). Now subtract 5 from the result, counting the number of times you do the subtraction until the value is less than 5. The result is number of subtractions you did. Oh, and dividing is probably going to be faster.

A hybrid strategy of shift right then divide by 5 using the normal division might get you a performance improvement if the logic in the divider doesn't already do this for you.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
0

I've designed a new method in AVR assembly, with lsr/ror and sub/sbc only. It divides by 8, then sutracts the number divided by 64 and 128, then subtracts the 1,024th and the 2,048th, and so on and so on. Works very reliable (includes exact rounding) and quick (370 microseconds at 1 MHz). The source code is here for 16-bit-numbers: http://www.avr-asm-tutorial.net/avr_en/beginner/DIV10/div10_16rd.asm The page that comments this source code is here: http://www.avr-asm-tutorial.net/avr_en/beginner/DIV10/DIV10.html I hope that it helps, even though the question is ten years old. brgs, gsc

-1

elemakil's comments' code can be found here: https://doc.lagout.org/security/Hackers%20Delight.pdf page 233. "Unsigned divide by 10 [and 11.]"

milkpirate
  • 267
  • 1
  • 18
  • Link-only answers are not what Stack Overflow is about. If that covers the method described in some other answer, you could leave a comment or make a suggested adit. But this isn't enough to be an answer on its own. Alternatively you could quote or summarize some of what it says and highlight the key parts, if that would make a minimal answer even if the link breaks. – Peter Cordes Jun 12 '20 at 23:14