getting the result of modulo 10 in theory

Question

I am currently trying to simulate modulo 10 of a number in a simple version of assembly with limited operations. However I can't really think of an efficient way to get modulo 10 of a high number. What my current and obvious "pseudo algorithm" for calculating modulo is:

e.g 117mod10 = 7

Calculate 117 minus 10 as long as it is positive, then add 10.

So with 117 at some point I would get -3 and then I would stop subtracting with 10 and once add 10 instead. (I know in this example will be a bug with numbers who are actually divisible by 10). So my problem is that with bigger numbers it would take way too long. Therefore I wanted to ask if there is any way to realize something like modulo 10 with these available operations: add, subtract, bitwise and, bitwise xor, bitwise shift left / shift right ?

you can have table with powers of 10 to subtract whichever is smaller+equal than current value, i.e. table with 10000,1000,100,10 (start with reasonable max value depending on the bit size of your input (10000 is enough for 16 bit values), end with 10) (and the smaller+equal test will fix also last subtraction, because `((10 <= 7) == false)`. - that said, you need reasonable instructions + spare memory to work with look-up-array like that, you listed only arithmetic ones. — Ped7g, Apr 03 '18 at 20:47
On a CPU with efficient HW multiply (like modern x86), compilers will use a fixed-point multiplicative inverse to evaluate `x % 10` as `x - (x/10) * 10`. https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi. You *can* implement multiply as shift/add, but the "magic constant" in this case has a *lot* of set bits, and thus would take a lot of shift/add instructions to hard-code that multiplier. (And you need the high half of the full result, e.g. of a 32x32 => 64-bit multiply). — Peter Cordes, Apr 03 '18 at 23:12
But that might still be faster than an iterative division algorithm, and wouldn't require any branching. (Division can be *much* faster than repeated subtraction, like O(log n) instead of O(n). See https://stackoverflow.com/a/32443307/224132 for x86 and C implementations of binary long division 1 bit at a time. possible duplicate. — Peter Cordes, Apr 03 '18 at 23:16
[Current](https://gmplib.org/~tege/division-paper.pdf) iteration of [earlier](https://gmplib.org/~tege/divcnst-pldi94.pdf) results. You should be able to find the 'inverse' and shift constants for `(10)`, depending on the micro's word size. — Brett Hale, Apr 06 '18 at 10:18

user5329483 · Answer 1 · 2018-04-05T17:39:15.307

Here an approach by transferring the number bitwise to the result. While doing this, we systematically 'forget' to transfer the high order digits.

int Modulu10(unsigned Num)
{
   while (Num > 9)
   {
     int Result = Num & 1;

     //The algorithm is quite inefficient with lower numbers, 
     //so here an isolated optimization.
     if (Num>=20) Num -= 20;  
     if (Num>=10) Num -= 10; //mandatory: No infinite loop for Num=10..15!!

     while (Num>1) //ignoring the LSB
     {
        if (Num &  2) Result +=  2;
        if (Num &  4) Result +=  4;
        if (Num &  8) Result +=  8;
        if (Num & 16) Result += 16 % 10; //becomes six
        /* Instead of the next tests
           if (Num & 32) Result +=  32 % 10; =two
           if (Num & 64) Result +=  64 % 10; =four
          we can map it in a loop: 
           The test against  32 gives the same result as the test against 2  
           The test against  64 gives the same result as the test against 4
           The test against 128 gives the same result as the test against 8  
        */
        Num >>= 4;
     }
     Num = Result; //may need again a modulu-run
   }
   return Num;
}

Interesting approach. You could do `result += 2 * popcnt(Num & (2|32|512|...))` to do all the powers of 2 that are `%10 == 2` in one step. You probably don't have hardware popcnt if you don't have HW multiply, but a bit-hack or table-lookup approach could be useful. A bit-hack approach could be more efficient than in the general case because you already know a lot of bits are zero and don't need to get added, and are safe for add to carry into when producing multiple partial sums within one integer. It could also produce the result already multiplied by 2, 4, or 8 by shifting less. — Peter Cordes, Apr 05 '18 at 05:38

getting the result of modulo 10 in theory

1 Answers1