Is there any way to write "mod 31" without modulus/division operators?

Question

Getting the modulus of a number can be easily done without the modulus operator or divisions, if your operand is a power of 2. In that case, the following formula holds: x % y = (x & (y − 1)). This is often many performant in many architectures. Can the same be done for mod 31?

int mod31(int a){ return a % 31; };

It can be done, but not easily - you're not going to like it. Are you still interested? — harold, Sep 25 '14 at 20:11
For the sake of the question, why not? I might edit it with the reason, though. — MaiaVictor, Sep 25 '14 at 20:11
Duplicate of this? : http://stackoverflow.com/questions/3072665/bitwise-and-in-place-of-modulus-operator — Chris, Sep 25 '14 at 20:11
@Viclib here you go: http://graphics.stanford.edu/~seander/bithacks.html#ModulusDivision — harold, Sep 25 '14 at 20:12
@harold did it take you 7.5 million years to come up with that? ;) — M.M, Sep 25 '14 at 23:43

score 9 · Answer 1 · edited May 23 '17 at 12:08

Here are two ways to approach this problem. The first one using a common bit-twiddling technique, and if carefully optimized can beat hardware division. The other one substitutes a multiply for the divide, similar to the optimization performed by gcc, and is far and away the fastest. The bottom line is that there's not much point trying to avoid the % operator if the second argument is constant, because gcc's got it covered. (And probably other compilers, too.)

The following function is based on the fact that x is the same (mod 31) as the sum of the base-32 digits of x. That's true because 32 is 1 mod 31, and consequently any power of 32 is 1 mod 31. So each "digit" position in a base-32 number contributes the digit * 1 to the mod 31 sum. And it's easy to get the base-32 representation: we just take the bits five at a time.

(Like the rest of the functions in this answer, it will only work for non-negative x).

unsigned mod31(unsigned x) {
  unsigned tmp;
  for (tmp = 0; x; x >>= 5) {
    tmp += x & 31;
  }
  // Here we assume that there are at most 160 bits in x
  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

For a specific integer size, you could unroll the loop and quite possibly beat division. (And see @chux's answer for a way to convert the loop into O(log bits) operations instead of O(bits) It's more difficult to beat gcc, which avoids division when the dividend is a constant known at compile-time.

In a very quick benchmark using unsigned 32 bit integers, the naive unrolled loop took 19 seconds and a version based on @chux's answer took only 13 seconds, but gcc's x%31 took 9.7 seconds. Forcing gcc to use a hardware divide (by making the division non-constant) took 23.4 seconds, and the code as shown above took 25.6 seconds. Those figures should be taken with several grains of salt. The times are for computing i%31 for all possible values of i, on my laptop using -O3 -march=native.

gcc avoids 32-bit division by a constant by replacing it with what is essentially a 64-bit multiplication by the inverse of the constant followed by a right shift. (The actual algorithm does a bit more work to avoid overflows.) The procedure was implemented more than 20 years ago in gcc v2.6, and the paper which describes the algorithm is available on the gmp site. (GMP also uses this trick.)

Here's a simplified version: Say we want to compute n // 31 for some unsigned 32-bit integer n (using the pythonic // to indicate truncated integer division). We use the "magic constant" m = 2³² // 31, which is 138547332. Now it's clear that for any n:

m * n <= 2³² * n/31 < m * n + n ⇒ m * n // 2³² <= n//31 <= (m * n + n) // 2³²

(Here we make use of the fact that if a < b then floor(a) <= floor(b).)

Furthermore, since n < 2³², m * n // 2³² and (m * n + n) // 2³² are either the same integer or two consecutive integers. Consequently, one (or both) of those two is the actual value of n//31.

Now, we really want to compute n%31. So we need to multiply the (presumed) quotient by 31, and subtract that from n. If we use the smaller of the two possible quotients, it may turn out that the computed modulo value is too big, but it can only be too big by 31.

Or, to put it in code:

static unsigned long long magic = 138547332;
unsigned mod31g(unsigned x) {
  unsigned q = (x * magic) >> 32;
  // To multiply by 31, we multiply by 32 and subtract
  unsigned mod = x - ((q << 5) - q);
  return mod < 31 ? mod : mod - 31;
}

The actual algorithm used by gcc avoids the test at the end by using a slightly more accurate computation based on multiplying by 2³⁷//31 + 1. That always produces the correct quotient, but at the cost of some extra shifts and adds to avoid integer overflow. As it turns out, the version above is slightly faster -- in the same benchmark as above, it took only 6.3 seconds.

Other benchmarked functions, for completeness:

Naive unrolled loop

unsigned mod31b(unsigned x) {
  unsigned tmp = x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31; x >>= 5;
  tmp += x & 31;

  tmp = (tmp >> 5) + (tmp & 31);
  return tmp >= 31 ? tmp - 31 : tmp;
}

@chux's improvement, slightly optimized

static const unsigned mask1 = (31U << 0) | (31U << 10) | (31U << 20) | (31U << 30);
static const unsigned mask2 = (31U << 5) | (31U << 15) | (31U << 25);
unsigned mod31c(unsigned x) {
  x = (x & mask1) + ((x & mask2) >> 5);
  x += x >> 20;
  x += x >> 10;

  x = (x & 31) + ((x >> 5) & 31);
  return x >= 31 ? x - 31: x;
}

Very nice +1. @harold link above shows additional info too. This code would easily modify for 1,3,7,15,63,127,... — chux - Reinstate Monica, Sep 25 '14 at 20:53
@n.m.: quite right. Made some other fixes, while I was at it. — rici, Sep 25 '14 at 21:37
[this question](http://stackoverflow.com/questions/22300185/bit-hacking-and-modulo-operation) has the explanation for this — phuclv, Sep 26 '14 at 08:20

score 6 · Answer 2 · edited May 23 '17 at 12:14

[Edit2] below for performance notes

An attempt with only 1 if condition.

This approach is O(log2(sizeof unsigned)). Run time would increase by 1 set of ands/shifts/add rather than twice the time with a loop approach should code use uint64_t.

unsigned mod31(uint32_t x) {
  #define m31 (31lu)
  #define m3131 ((m31 << 5) | m31)
  #define m31313131 ((m3131 << 10) | m3131)

  static const uint32_t mask1 = (m31 << 0) | (m31 << 10) | (m31 << 20) | (m31 << 30);
  static const uint32_t mask2 = (m31 << 5) | (m31 << 15) | (m31 << 25);
  uint32_t a = x & mask1;
  uint32_t b = x & mask2;
  x = a + (b >> 5);
  // x = xx 0000x xxxxx 0000x xxxxx 0000x xxxxx

  a = x & m31313131;
  b = x & (m31313131 << 20);
  x = a + (b >> 20);
  // x = 00 00000 00000 000xx xxxxx 000xx xxxxx

  a = x & m3131;
  b = x & (m3131 << 10);
  x = a + (b >> 10);
  // x = 00 00000 00000 00000 00000 00xxx xxxxx

  a = x & m31;
  b = x & (m31 << 5);
  x = a + (b >> 5);
  // x = 00 00000 00000 00000 00000 0000x xxxxx

  return x >= 31 ? x-31 : x;
}

[Edit]

The first addition method sums the individual 7 groups of five bit in parallel. Subsequent additions bring the 7 group into 4, then 2, then 1. This final 7-bit sum then proceeds to add its upper half (2-bits) to its lower half(5-bits). Code then uses one test to perform the final "mod".

This method scales for wider unsigned up to at least uint165_t log2(31+1)*(31+2). Pass that, a little more code is needed.

See @rici for some good optimizations. Still recommend using uint32_t vs. unsigned and 31UL in shifts like 31U << 15 as an unsigned 31U may only be 16 bits long. (16 bit int popular in embedded world in 2014).

[Edit2]

Besides letting the compiler use its optimizer, 2 additional techniques sped performance. These are more minor parlor tricks that yielded a modest improvement. Keep in mind YMMV and this is for a 32-bit unsigned.

Using a table look-up for the last modulo improved 10-20%. Using unsigned t table rather than unsigned char t helped a bit too. It turned out that table length, as first expected needed to be 2*31, only needed 31+5.

Using a local variable rather than always calling the function parameter surprisingly helped. Likely a weakness in my gcc compiler.

Found non-branching solutions, not shown, to replace x >= 31 ? x-31 : x. but their coding complexity was greater and performance was slower.

All-in-all, a fun exercise.

unsigned mod31quik(unsigned xx) {
  #define mask (31u | (31u << 10) | (31u << 20) | (31u << 30))
  unsigned x = (xx & mask) + ((xx >> 5) & mask);
  x += x >> 20;
  x += x >> 10;
  x = (x & 31u) + ((x >> 5) & 31u);

  static const unsigned char t[31 * 2 /* 36 */] = { 0, 1, 2, 3, 4, 5, 6,
      7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
      25, 26, 27, 28, 29, 30, 0, 1, 2, 3, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
  return t[x];
}

Nice. The m313131... masks aren't necessary; I put an uncommented but tested version into my answer (with credit) and benchmarked it. Almost as fast as gcc's multiply/shift, but still doesn't get there. — rici, Sep 26 '14 at 04:16
@rici Yes, each time I worked with it it became smaller and smaller. But the sandman is calling. — chux - Reinstate Monica, Sep 26 '14 at 04:22

lvella · Answer 3 · 2014-09-25T20:20:29.680

2

int mod31(int a){
    while(a >= 31) {
        a -= 31;
    }
    return a;
};

It works if a > 0, but I doubt it will be faster than % operator.

edited Sep 25 '14 at 20:20

answered Sep 25 '14 at 20:11

lvella

12,754
11
54
106

What about `(a > 30)` and `return a;`? – Gluttton Sep 25 '14 at 20:17
2

Don't forget that a could be a negative number – ErstwhileIII Sep 25 '14 at 20:17
There is no definitive convetion for modulus of negative number. Modular arithmetics is defined for natural numbers. Op asked for (mod 31), not to simulate C `%` behavior in all ranges. – lvella Sep 25 '14 at 20:27
1

For both + and - numbers, C has well defined functions for floating point numbers via `fmod()` and `remainder()`. C has a well defined remainder operator `%` for integers. Although missing a well defined modulo definition for integers, a matching functionality for whole number argument to `fmod()` is certainly reasonable. – chux - Reinstate Monica Sep 25 '14 at 20:44

score 2 · Answer 4 · edited May 23 '17 at 12:22

If you want to get the modulus of dividing by a denominator d such that d = (1 << e) - 1 where e is some exponent, you can use the fact that the binary expansion of 1/d is a repeating fraction with bits set every e digits. For example, for e = 5, d = 31, and 1/d = 0.0000100001....

Similar to rici’s answer, this algorithm effectively computes the sum of the base-(1 << e) digits of a:

uint16_t mod31(uint16_t a) {
    uint16_t b;
    for (b = a; a > 31; a = b)
        for (b = 0; a != 0; a >>= 5)
            b += a & 31;
    return b == 31 ? 0 : b;
}

You can unroll this loop, because the denominator and the number of bits in the numerator are both constant, but it’s probably better to let the compiler do that. And of course you can change 5 to an input parameter and 31 to a variable computed from that.

ErstwhileIII · Answer 5 · 2014-09-25T20:17:03.243

1

You could use successive addition / subtraction. There is no other trick since 31 is a prime number to see what the modulus of a number N is mod 31 you will have to divide and find the remainder.

int mode(int number, int modulus) {
    int result = number;

    if (number >= 0) {
         while(result > modulus) { result = result - modulus;}
    } else {
         while (result < 0) { result = result + modulus;)
    }
}

edited Sep 25 '14 at 20:17

answered Sep 25 '14 at 20:11

ErstwhileIII

4,829
2
23
37

2

I don't think being a prime number has anything to do with there being a "trick" available or not. – JJJ Sep 25 '14 at 20:22
3

Actually, there are lots of tricks available :) See other answers – rici Sep 26 '14 at 04:25

Is there any way to write "mod 31" without modulus/division operators?

5 Answers5

Linked