Fast multiplication modulo 2^16 + 1

Question

The IDEA cipher uses multiplication modulo 2^16 + 1. Is there an algorithm to perform this operation without general modulo operator (only modulo 2^16 (truncation))? In the context of IDEA, zero is interpreted as 2^16 (it means zero isn't an argument of our multiplication and it cannot be the result, so we can save one bit and store value 2^16 as bit pattern 0000000000000000). I am wondering how to implement it efficiently (or whether it is possible at all) without using the standard modulo operator.

I've added some clarification. – ciechowoj Mar 09 '15 at 10:30 — ciechowoj, Mar 09 '15 at 10:30

Aki Suihkonen · Accepted Answer · 2015-03-09T11:35:20.013

5

You can utilize the fact, that (N-1) % N == -1.

Thus, (65536 * a) % 65537 == -a % 65537.
Also, -a % 65537 == -a + 1 (mod 65536), when 0 is interpreted as 65536

uint16_t fastmod65537(uint16_t a, uint16_t b)
{
    uint32_t c;
    uint16_t hi, lo;
    if (a == 0)
        return -b + 1;
    if (b == 0)
        return -a + 1;

    c = (uint32_t)a * (uint32_t)b;
    hi = c >> 16;
    lo = c;
    if (lo > hi)
        return lo-hi;
    return lo-hi+1;
}

The only problem here is if hi == lo, the result would be 0. Luckily a test suite confirms, that it actually can't be...

int main()
{
    uint64_t a, b;
    for (a = 1; a <= 65536; a++)
       for (b = 1; b <= 65536; b++)
        { 
            uint64_t c = a*b;
            uint32_t d = (c % 65537) & 65535;
            uint32_t e = m(a & 65535, b & 65535);
            if (d != e)
                printf("a * b % 65537 != m(%d, %d) real=%d m()=%d\n",
                       (uint32_t)a, (uint32_t)b, d, e);
        }
    }

Output: none

edited Mar 09 '15 at 11:35

answered Mar 09 '15 at 11:29

Aki Suihkonen

19,144
1
36
57

ps. Running the suite in core i5 proved the % 65537 method faster. – Aki Suihkonen Mar 09 '15 at 11:42
I still do not understand how that 5 last lines work. I suppose it is slower because of that two ifs or compiler is doing some magic under the hood. – ciechowoj Mar 09 '15 at 12:37
Ok, I see that this shift is just div and truncation is mod. – ciechowoj Mar 09 '15 at 12:42
1

Combining the first tests to `(a*b == 0)` and wrapping the last expression to `c = lo-hi; c += c >> 16;` one gets on par with the % method on i5. A branchless version will of course outperform the div method when vectorized. – Aki Suihkonen Mar 09 '15 at 13:21
Can it be proven somehow from basic principles that `hi != lo`? Is it correct to do `c += c >> 16`? I think that when `lo > hi` then upper bis of `c` are all zeros, and when `lo < hi` then upper bits are ones. So `c += c >> 16` really means `c = c + 2^16 - 1` which when taken modulo `2^16` is `c - 1` and in the first version we were adding one instead of subtracting it (`return lo-hi+1;`). – ciechowoj Mar 10 '15 at 09:17
2

Yes. hi == lo would require that a * b = 0x10001 * x . 0x10001 is a prime (actually the same prime 65537 that we are taking the modulo) and both _a_ and _b_ are smaller than that. Thus a*b can't form such number. OTOH this is not true for M=2^32+1, which is a composite and prevents augmenting IDEA to 32-bit integers. – Aki Suihkonen Mar 10 '15 at 09:27
1

The final wrapping must equal to `lo-hi+1`, which is actually equivalent to `c -= c >> 16; // mod 65536`, not `c += xxx`, just as in the other answer. – Aki Suihkonen Mar 10 '15 at 09:33

Michael Foukarakis · Answer 2 · 2015-03-09T12:20:55.130

First, the case where either a or b is zero. In that case, it is interpreted as having the value 2^16, therefore elementary modulo arithmetic tells us that:

result = -a - b + 1;

, because (in the context of IDEA) the multiplicative inverse of 2^16 is still 2^16, and its lowest 16 bits are all zeroes.

The general case is much easier than it seems, now that we took care of the "0" special case (2^16+1 is 0x10001):

/* This operation can overflow: */
unsigned result = (product & 0xFFFF) - (product >> 16);
/* ..so account for cases of overflow: */
result -= result >> 16;

Putting it together:

/* All types must be sufficiently wide unsigned, e.g. uint32_t: */
unsigned long long product = a * b;
if (product == 0) {
    return -a - b + 1;
} else {
    result = (product & 0xFFFF) - (product >> 16);
    result -= result >> 16;
    return result & 0xFFFF;
}

Fast multiplication modulo 2^16 + 1

2 Answers2

Linked