Speed difference when using different values with modulo

Question

I'm using gcc 4.6.3 and creating a large array of random shorts. I'm generating them with the following statements:

val = SHRT_MAX; //as defined by limits.h
while(array<end) {
    *array++ = rand() % val;
}

This is a considerably fast operation, and even for arrays as large as 5,000,000 elements is completed almost instantly. I was curious about my sorts efficiency with a smaller variation in numbers and changed that to:

val = 3;

This caused a considerable speed difference, it ran much slower than the original statements. What is it that is causing such a considerable speed difference?

Two words: Compiler Optimizations. Some moduli are easier to mod by than others. — Mysticial, Jan 24 '13 at 22:42
Just as a remark, taking the modulus of an uniformly distributed random number doesn't necessarily produce uniformly distributed random numbers in a range. — Benjamin Bannier, Jan 24 '13 at 22:43
@honk - Could you expand on that (or link to an explanation)? I don't understand why that would be the case. — Gordon Bailey, Jan 25 '13 at 01:47
@GordonBailey: This almost is a FAQ regarding random numbers here. See http://eternallyconfuzzled.com/arts/jsw_art_rand.aspx for some detailed explanation. http://stackoverflow.com/questions/2509679/how-to-generate-a-random-number-from-within-a-range-c is an answer here. — Benjamin Bannier, Jan 25 '13 at 08:47

score 3 · Accepted Answer · answered Jan 24 '13 at 23:04

SHRT_MAX is most likely greater than or equal to RAND_MAX. The statement:

*array++ = rand() % val;

can be optimized into:

int rand_value= rand();
if (rand_value==RAND_MAX) rand_value= 0;
*array++= rand_value;

which is faster because it replaces a modulus with a branch. The second version, where val is 3, cannot be optimized into a simpler version that runs without modulus.

% SHRT_MAX cannot be simplified into a bitwise operation. But combined with knowledge of how rand() is specified, the compiler can certainly optimize statements dealing with rand() and values greater than or equal to RAND_MAX.

score 2 · Answer 2 · edited May 23 '17 at 12:27

Compilers can optimize calculation of modulo (a%B), where B is a constant. It replaces actual modulo with simpler arithmetic operations. The details are explained in topics like Most optimized way to calculate modulus in C. However such optimizations are faster for some values of B than for others.

Even CPU division/modulo instruction can take different number of cycles complete (at least on some CPUs). See numbers for x86 here: http://gmplib.org/~tege/x86-timing.pdf.

score 0 · Answer 3 · answered Jan 25 '13 at 00:12

0

SHRT_MAX is a 2^n-1 value, which can be optimised for divide. Dividing by 3 is much tougher, so the compiler may well decide to divide by 3 (or do some other magic operation that is slower than the 2^n-1 variant.

The fastest modulo you can use is for 2^n, which can be replaced with a single and-instruciton, for positive values: x % 256 is the same as x & 255. Unfortunately, when the value may be negative, it's not quite so easy...

answered Jan 25 '13 at 00:12

Mats Petersson

126,704
14
140
227

3 is also `2^n - 1` (`n` = 2). – Gordon Bailey Jan 25 '13 at 01:48

Speed difference when using different values with modulo

3 Answers3