1

I'm writing a multi-precision library in C++, using a base of 2^64, and currently I'm working on the mod operation. I'm using Algorithm D described in Donald E. Knuth's 1998 edition of "The Art Of Computer Programming" Vol. 2, Section 4.3.1, for division, which yields a quotient and a remainder. For the mod operation, I'm performing a division, throwing away the quotient in the end. Although Knuth's Algorithm D is very fast if implemented in C++ with some ASM enhancements for the partial division and the concurrent multi-precision multiplication/subtraction in each step, I'm not sure if there is a better way, since throwing away a painstakingly computed result doesn't seem efficient to me.

Unfortunately, it's not possible to get rid of the partial division in Algorithm D, because the partial quotient is required to compute the remainder, by subtracting the product of the partial quotient and the divisor from the dividend iteratively.

I've searched the Internet for alternative solutions, and found the influential papers written by Paul Barrett and Peter L. Montgomery. However, the fancy tricks they use seem to pay off only if lots of mod operations are performed in a row with the same modulus, since they involve heavy precomputations. This is the case in complex operations like modular exponentiation, where the mod of several squares and products is required for a single modulus. Barrett starts with the basic definition of the remainder, r = a - b * (a / b), and changes the division to a multiplication with the reciprocal of b. Then he presents an efficient way to compute this multiplication, which pays off if the reciprocal is computed once for several similar computations. Montgomery transforms the operands into a completely different residue system, where modular arithmetic is cheap, but for the price of transformations to and fro.

Additionally, Both algorithms introduce some restrictions, which need to be met for correct operation. Montgomery, for instance, usually requires the operands to be odd, which is the case in RSA calculations with primes, but which cannot be assumed in the general case. Outside these restrictions, even more overhead for normalizations is required.

So what I need, is an efficient one-shot mod function without overhead and special restrictions. Hence my question is: Is it possible to compute a remainder without computing the quotient in the first place, in a way that is more efficient than division?

SBS
  • 806
  • 5
  • 13
  • This question might be a better fit on [CS.SE](https://cs.stackexchange.com/help/on-topic). If you decide to post there, you should delete your post here. That said, I'm not sure how you could compute a remainder without finding how many times the divisor goes into the dividend. – NathanOliver Apr 08 '19 at 12:57
  • I don't think there is a more efficient general remainder function, but there are fast ways to calculate some modulos. In particular, modulo with a power of 2 can be replaced by a bitwise operations. – eerorika Apr 08 '19 at 13:05
  • 1
    @FrançoisAndrieux: The compiler doesn't have a multi-precision `mod` or `%` operation, so no, the compiler will not automatically write his function efficiently (or at all). – Ben Voigt Apr 08 '19 at 14:19
  • @eerorika I'm aware that div and mod by a power of two are trivial and equivalent to >> and logical &. However, the probability for this pattern is extremely low in the general case, i.e. 2^-64 for a 64-bit divisor, and even lower for bigger numbers. Detecting special patterns is useful only if there's some chance that they appear in the first place, and in my case (general-purpose library), their occurrence is extremely rare, so any additional tests would just slow down the general case. Actually, if such a pattern appears, I usually know it beforehand, so I wouldn't use a div or mod anyway. – SBS Apr 08 '19 at 16:24
  • @SBS hence my suspicion that a more efficient general remainder function doesn't exist. – eerorika Apr 08 '19 at 16:27
  • @NathanOliver Well, Barrett and Montgomery show that it's actually possible, but their approaches are optimized for RSA stuff, and not for general-purpose arithmetic. So there's definitely some hope. Interestingly, Barrett uses an idea similar to Knuth's Algorithm D, in that he computes a good estimate first, which needs to be refined with acceptable probability in turn. – SBS Apr 08 '19 at 16:31
  • 1
    @SBS I think your understanding is solid. Special solutions exist for particular contexts. But best I know, there is no generally applicable general-purpose solution. However the cost of the back-multiply is usually small compared to the cost of the division itself, which requires significantly more multiplies, even if you use something like a Halley iteration for the reciprocal (with cubic convergence) as the basis. – njuffa Apr 08 '19 at 22:19
  • @njuffa Maybe **Algorithm D** is exactly what I'm searching for, since it trades off partial divisions with partial multiplications. Actually, if `a > b`, and both operands have the same number of "digits" `n` with respect to the base (i.e. a "digit" is a 64-bit number here), just a single partial division and `n` partial multiplications are performed. If `a` has `m` digits, and `b` has `n` digits, `m-n+1` divisions and `n` multiplcations are needed. So maybe that's already the most efficient way to go in the general case...?! – SBS Apr 09 '19 at 12:42
  • 1
    @SBS First order of business is to build the fastest multiplication possible (with Karatsuba, Toom-Cook, FFT as appropriate depending on bit-length). Then you can use that to implement division via iterative computation of the reciprocal (Halley iteration with cubic convergence), as I showed for `udiv64` in [this question](https://stackoverflow.com/questions/36853827/efficient-computation-of-264-divisor-via-fast-floating-point-reciprocal). As I recall, algorithm D from Knuth is simply long-hand division using a high radix, which has linear convergence. – njuffa Apr 09 '19 at 14:11
  • 1
    @SBS The [Yacas Book of Algorithms](https://dcc.ufrj.br/~lula/SISTEMB/YacasAlg.htm) seems to suggest that a second-order iteration (Newton iteration) is optimal for computing the reciprocal, rather than the third-order iteration (Halley iteration) I suggested. Best I gather from a *quick* perusal they define optimal as requiring the minimal number of elementary multiply operations. – njuffa Apr 09 '19 at 14:37

1 Answers1

-1

One suggestion would be to write a simple function that would calculate A%B=C and store the A, B and C values into an array, then store all the results into a vector. Then print them out to see the relationships of all of the inputs and output values.

There is one thing that can be done to simplify some of this work and that is to know some of the properties of the mod function. These two statements will help you out with the function.

 0 mod N = 0
 N mod 0 = undefined

Since 0 mod N = 0 we can put a test case for A and if it is 0 we can just use that to populate our array. Likewise if B = 0 we can populate our array's C value with -1 just to represent undefined because you can not perform A mod 0 as the compilation will fail due to division by 0.

I wrote this function to do just that; then I run it through a loop for both A & B from [0,15].

#include <array>
#include <vector>
#include <iostream>

std::array<int, 3> calculateMod(int A, int B) {
    std::array<int, 3 > res;
    if (A == 0) {       
        res = std::array<int, 3>{ 0, B, 0 };
    }
    else if (B == 0) {
        res = std::array<int, 3>{ A, 0, -1 };
    }
    else {
        res = std::array<int, 3>{ A, B, A%B };
    }
    return res;
}

int main() {
    std::vector<std::array<int, 3>> results;

    int N = 15; 
    for (int A = 0; A <= N; A++) {
        for (int B = 0; B <= N; B++) {
            results.push_back(calculateMod(A, B));
        }
    }

    // Now print out the results in a table form:
    int i = 0; // Index for formatting output
    for (auto& res : results) {
        std::cout << res[0] << " % " << res[1] << " = " << res[2] << '\n';

        // just for formatting output data to make it easier to read.
        i++;
        if ( i > N ) {
            std::cout << '\n';
            i = 0;
        }
    }
    return 0;
}

Here is it's output:

0 % 0 = 0
0 % 1 = 0
0 % 2 = 0
0 % 3 = 0
0 % 4 = 0
0 % 5 = 0
0 % 6 = 0
0 % 7 = 0
0 % 8 = 0
0 % 9 = 0
0 % 10 = 0
0 % 11 = 0
0 % 12 = 0
0 % 13 = 0
0 % 14 = 0
0 % 15 = 0

1 % 0 = -1
1 % 1 = 0
1 % 2 = 1
1 % 3 = 1
1 % 4 = 1
1 % 5 = 1
1 % 6 = 1
1 % 7 = 1
1 % 8 = 1
1 % 9 = 1
1 % 10 = 1
1 % 11 = 1
1 % 12 = 1
1 % 13 = 1
1 % 14 = 1
1 % 15 = 1

2 % 0 = -1
2 % 1 = 0
2 % 2 = 0
2 % 3 = 2
2 % 4 = 2
2 % 5 = 2
2 % 6 = 2
2 % 7 = 2
2 % 8 = 2
2 % 9 = 2
2 % 10 = 2
2 % 11 = 2
2 % 12 = 2
2 % 13 = 2
2 % 14 = 2
2 % 15 = 2

3 % 0 = -1
3 % 1 = 0
3 % 2 = 1
3 % 3 = 0
3 % 4 = 3
3 % 5 = 3
3 % 6 = 3
3 % 7 = 3
3 % 8 = 3
3 % 9 = 3
3 % 10 = 3
3 % 11 = 3
3 % 12 = 3
3 % 13 = 3
3 % 14 = 3
3 % 15 = 3

4 % 0 = -1
4 % 1 = 0
4 % 2 = 0
4 % 3 = 1
4 % 4 = 0
4 % 5 = 4
4 % 6 = 4
4 % 7 = 4
4 % 8 = 4
4 % 9 = 4
4 % 10 = 4
4 % 11 = 4
4 % 12 = 4
4 % 13 = 4
4 % 14 = 4
4 % 15 = 4

5 % 0 = -1
5 % 1 = 0
5 % 2 = 1
5 % 3 = 2
5 % 4 = 1
5 % 5 = 0
5 % 6 = 5
5 % 7 = 5
5 % 8 = 5
5 % 9 = 5
5 % 10 = 5
5 % 11 = 5
5 % 12 = 5
5 % 13 = 5
5 % 14 = 5
5 % 15 = 5

6 % 0 = -1
6 % 1 = 0
6 % 2 = 0
6 % 3 = 0
6 % 4 = 2
6 % 5 = 1
6 % 6 = 0
6 % 7 = 6
6 % 8 = 6
6 % 9 = 6
6 % 10 = 6
6 % 11 = 6
6 % 12 = 6
6 % 13 = 6
6 % 14 = 6
6 % 15 = 6

7 % 0 = -1
7 % 1 = 0
7 % 2 = 1
7 % 3 = 1
7 % 4 = 3
7 % 5 = 2
7 % 6 = 1
7 % 7 = 0
7 % 8 = 7
7 % 9 = 7
7 % 10 = 7
7 % 11 = 7
7 % 12 = 7
7 % 13 = 7
7 % 14 = 7
7 % 15 = 7

8 % 0 = -1
8 % 1 = 0
8 % 2 = 0
8 % 3 = 2
8 % 4 = 0
8 % 5 = 3
8 % 6 = 2
8 % 7 = 1
8 % 8 = 0
8 % 9 = 8
8 % 10 = 8
8 % 11 = 8
8 % 12 = 8
8 % 13 = 8
8 % 14 = 8
8 % 15 = 8

9 % 0 = -1
9 % 1 = 0
9 % 2 = 1
9 % 3 = 0
9 % 4 = 1
9 % 5 = 4
9 % 6 = 3
9 % 7 = 2
9 % 8 = 1
9 % 9 = 0
9 % 10 = 9
9 % 11 = 9
9 % 12 = 9
9 % 13 = 9
9 % 14 = 9
9 % 15 = 9

10 % 0 = -1
10 % 1 = 0
10 % 2 = 0
10 % 3 = 1
10 % 4 = 2
10 % 5 = 0
10 % 6 = 4
10 % 7 = 3
10 % 8 = 2
10 % 9 = 1
10 % 10 = 0
10 % 11 = 10
10 % 12 = 10
10 % 13 = 10
10 % 14 = 10
10 % 15 = 10

11 % 0 = -1
11 % 1 = 0
11 % 2 = 1
11 % 3 = 2
11 % 4 = 3
11 % 5 = 1
11 % 6 = 5
11 % 7 = 4
11 % 8 = 3
11 % 9 = 2
11 % 10 = 1
11 % 11 = 0
11 % 12 = 11
11 % 13 = 11
11 % 14 = 11
11 % 15 = 11

12 % 0 = -1
12 % 1 = 0
12 % 2 = 0
12 % 3 = 0
12 % 4 = 0
12 % 5 = 2
12 % 6 = 0
12 % 7 = 5
12 % 8 = 4
12 % 9 = 3
12 % 10 = 2
12 % 11 = 1
12 % 12 = 0
12 % 13 = 12
12 % 14 = 12
12 % 15 = 12

13 % 0 = -1
13 % 1 = 0
13 % 2 = 1
13 % 3 = 1
13 % 4 = 1
13 % 5 = 3
13 % 6 = 1
13 % 7 = 6
13 % 8 = 5
13 % 9 = 4
13 % 10 = 3
13 % 11 = 2
13 % 12 = 1
13 % 13 = 0
13 % 14 = 13
13 % 15 = 13

14 % 0 = -1
14 % 1 = 0
14 % 2 = 0
14 % 3 = 2
14 % 4 = 2
14 % 5 = 4
14 % 6 = 2
14 % 7 = 0
14 % 8 = 6
14 % 9 = 5
14 % 10 = 4
14 % 11 = 3
14 % 12 = 2
14 % 13 = 1
14 % 14 = 0
14 % 15 = 14

15 % 0 = -1
15 % 1 = 0
15 % 2 = 1
15 % 3 = 0
15 % 4 = 3
15 % 5 = 0
15 % 6 = 3
15 % 7 = 1
15 % 8 = 7
15 % 9 = 6
15 % 10 = 5
15 % 11 = 4
15 % 12 = 3
15 % 13 = 2
15 % 14 = 1
15 % 15 = 0

From the data above we can see that if A == B the result will be 0. We can also see that if B > A then B == A. Finally we can see that there are patterns between odd and even values of A while B < A. If you can understand these patterns then most of it becomes algebraic manipulation. From here the next step would be to create an algorithm that would take all of this data and convert it to its binary equivalence.

I chose the value of N above as 15 for a reason. This is due to the binary representation of all the possible combinations of binary digits before they repeat again. We know that a single byte of data is 8 bits; we know that the values from [0,15] will fit into half of that; for example:

binary byte:  hex    decimal
0000 0000     0x00   0
...
0000 1111     0xFF   15

After these 15 different sequences of 0s and 1s these patterns will repeat. So by taking the table above you can convert these into binary representations. Now once you examine the representations of A & B inputs with their C outputs in binary and understand the 3 properties of the results that I had mentioned above; you should be able to design an algorithm to quickly compute the modulo of any A B combination quite easily. One trick to remember is that there are 3 other things to take into consideration. The first is what user eerokia had stated:

"In particular, modulo with a power of 2 can be replaced by a bitwise operations."

The next beyond that is are the values even or odd as the even and odd cases do present different patters of A mod B when B < A.

I have give you some tools of information to start out on, but the rest I'll leave up to you including the task of converting the A, B, and C values into their binary representations.

Once you know the binary patterns of both the A and B inputs according to their C outputs and you understand the truth tables of the logical gates - operators such as And - &, Or - |, Nand - (!&), Nor - (!|), Xor - ^ Xnor - (!^) and Not - ! as well as the compliment (~). You should be able to design your algorithm with efficiency.

Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
  • I had explained to the OP an approach that they could take in order to design their algorithm and given them some tips on what to understand about the operation of modulo with respect to its inputs and output values. Then to consider the relationship of those values from a binary digit representation, from there they can use bit manipulations to make their algorithm as efficient as possible, and I even explained the different cases between odd and even values of `A`. And how the output behaves when `A` & `B` are equal and when `B` is greater than `A`. And I still got a down vote... – Francis Cugler Apr 08 '19 at 14:23
  • 2
    Please note that I didn't downvote you. Basically, what you propose is to exploit the trivial cases. Of course I'm doing this already in my division function before trying Knuth's Algorithm D, so it comes for free in the `mod` function: If `B==0` -> divide by zero. If `A==B` -> `q=1, r=0`. If `A `q=0, r=A`. If `B==1` -> `q=a, r=0`. If `B<2^64` -> fast division by 64-bit number. If `bitlength(A)==bitlength(B)` -> `q=1, r=A-B`. Otherwise `Algorithm D`. Especially `A – SBS Apr 08 '19 at 16:14