3

Why does the following C code gives me different results on my desktop and server, both running similar versions of Linux?

It finds the longest same side in a row sequence in 18 trillion coin tosses. [See Iain M. Banks' science fiction novel Consider Phlebas.]

On the server, after 15.7 trillion coin tosses (it's still running), the longest same side in a row sequence so far is only 29. Since 2^44 = 17,592,186,044,416, I'd expect the longest same side sequence to be somewhere in the low to mid 40's, and probably 44 after all 18 trillion have been completed.

On the desktop after only 4.7 billion coin tosses the longest sequence was already 31, since 2^31 = 2,147,483,648, and that sounded about right.

So why have I got a sequence of only 29 on the server after 15.7 trillion coin tosses but a sequence of 31 after only 4.7 billion on my desktop?

Modulo bias was my first thought. RAND_MAX is the same on both desktop and server, 2,147,483,647 (a 32 bit signed long). So the rand() function will give me a number 0 <= rand() <= 2,147,483,647. 0 is even and 2,147,483,647 is odd, so unless I'm very much mistaken there's no modulo bias introduced by my int rand_num = (rand() % 2); line of code.

I know that the C standard library's pseudo-random number generator is not considered adequate for cryptography. Surely that could not be a factor when generating, admittedly really rather long, sequences of zeros and ones. Could it?

Here's the source:

Compiled on both machines using: gcc -O3 -o 18TCT 18TrillionCoinTosses.c

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(int argc, char* argv[])
{
    srand(time(NULL));

    int current_seq = 0;
    int longest_seq = 0;
    int prev_rand_num = -1;

    long long i = 0;
    long long total = 18000000000000;

    // To serve as a rudimentary progress indicator.
    long billion_counter = 0;
    long billion = 1000000000;

    while (i < total)
    {
        int rand_num = (rand() % 2);

        if (rand_num == prev_rand_num)
        {
            current_seq++;

            if (current_seq >= longest_seq)
            {
                longest_seq = current_seq;
                printf("Longest sequence so far: %d (on iteration %lli)\n", longest_seq, i);
            }
        }
        else
            current_seq = 1;

        if (billion_counter == billion)
        {
            billion_counter = 0;
            printf("Progress report, current iteration: %lli\n", i);
        }

        prev_rand_num = rand_num;

        i++;
        billion_counter++;
    }

    printf("\nTotal coins tossed: %lli\n", i);
    printf("Longest sequence: %d\n", longest_seq);
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mattst
  • 13,340
  • 4
  • 31
  • 43
  • 3
    TL;DR. Don't write a novel. See [ask]. – too honest for this site Mar 02 '16 at 17:52
  • 1
    Looks like the question is "why is output different between server and laptop?" 99% of the rest is fluff. – takendarkk Mar 02 '16 at 18:10
  • 4
    honestly, I enjoyed the read. – mfro Mar 02 '16 at 18:20
  • 1
    I enjoyed it, too, and if you take the time to understand it, it is *not* "why is output different between server and laptop?". – Steve Summit Mar 02 '16 at 18:29
  • Edited down to be much shorter after getting lots of negative votes. I thought it was a fun post about a trivial experiment and that the background was worth reading. Oh well. – mattst Mar 02 '16 at 18:45
  • 1
    `rand()` does not have a defined implementation spec, so implementations are all over the map. Many implementations are low performance linear congruential generators. Use of `rand()` is almost certainly an issue, I'd suggest trying a better generator such as [Mersenne Twister](https://en.wikipedia.org/wiki/Mersenne_Twister) or [WELL](https://en.wikipedia.org/wiki/Well_equidistributed_long-period_linear) – pjs Mar 02 '16 at 18:56
  • "I know that the C standard library's pseudo-random number generator is not considered adequate for cryptography. Surely that could not be a factor" - famous last words – M.M Mar 02 '16 at 21:24
  • This is relevant: http://stackoverflow.com/a/14679656/1566221 – rici Mar 02 '16 at 22:12

4 Answers4

6

Your random number generator is probably repeating after 2^32 = 4294967296 calls, so you're not really simulating 18 trillion trials. You need a better RNG, one that keeps more than 32 bits of internal state. On many systems, you can access a better RNG by simply calling random() instead of rand(). (On my system, man random says "random -- better random number generator" and "The period of this random number generator is very large, approximately 16*((2**31)-1)". Although that's "only" 34,359,738,352, which is still short of your 18 trillion.)

Also, as a side point, rand() % 2 is risky, although most RNGs these days don't have the problem that will burn you there (and if you did have that problem, you'd know it, because among other things you'd get 0 in a row no matter what).


Addendum: You can find references to some other, better random-number generators at question 13.15 in the C FAQ list: http://c-faq.com/lib/rand.html .

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
4

Even though your "random" bit 0 had equal zeros and ones, the pseudo random generator function rand() sequence repeats relatively often. In my test it repeated after 2147483648 (2**31) iterations of the loop. So there is no point going to 18 trillion. I ran the test several times, always the same result.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void)
{
    unsigned long long n = 0;
    int a, b, c, d;
    int e, f, g, h;

    srand((unsigned)time(NULL));
    e = a = rand();
    f = b = rand();
    g = c = rand();
    h = d = rand();
    do {
        n++;
        e = f;
        f = g;
        g = h;
        h = rand();
    } while (e != a || f != b || g != c || h != d);
    printf("%llu\n", n);
}
Weather Vane
  • 33,872
  • 7
  • 36
  • 56
  • Which 65,538 values didn't get generated? – rici Mar 02 '16 at 18:20
  • @rici it was probably 65,536 since I was testing for a sequence of three values repeating. When I checked for a sequence of four values repeating there was one less iteration of the loop. – Weather Vane Mar 02 '16 at 18:29
  • 1
    It's still curious, don't you think? Are you using the GNU trinomial generator, or some linear congruential variant? The GNU generator has quite a large state, so it *should* have a cycle length larger than RANDMAX, not that I've ever tested it. – rici Mar 02 '16 at 18:36
  • @rici yes, I was not counting correctly since I was repeating `rand()` within the loop when `a` matched. I've re-written and posted it FWIW. – Weather Vane Mar 02 '16 at 18:53
2

Your code seems to be fine. The problem might be the RNG your are using.

I don't think that rand() % 2 is uniform. Take a look here: Uniformity of random numbers taken modulo N

Why not C++11 Random Number Generators?http://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution

Last but not least, could -O3 be messing up with something?

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.

Community
  • 1
  • 1
Danilo Gasques
  • 666
  • 6
  • 16
  • 2
    Optimization settings will not affect random number generation. – rici Mar 02 '16 at 18:37
  • `rand() % 2` is uniform enough for all practical purposes. If `RAND_MAX` is even then `rand() % 2` will be perfectly uniform, if not it will off by one part in RAND_MAX. (Now, it's true that `rand() % 2` might have a problem with *distribution*, but that doesn't seem to be the problem here, either.) – Steve Summit Mar 02 '16 at 20:51
  • @steve: but here we are not just looking at frequency. A PRNG which alternated even and odd numbers could be perfectly unbiased in frequency, but it would be very biased if you were counting repeated r%2 values. Which is precisely the subject. – rici Mar 02 '16 at 21:08
  • @rici: I think we're saying the same thing. But it's clear the OP is *not* having the problem that his system's `rand()` implementation is alternating even and odd, because if it did, he'd never get *any* runs of all-heads or all-tails. – Steve Summit Mar 02 '16 at 21:22
  • @steve: that's correct. The problem is with the gnu trinomial generator which has a strong correlation with the 31st previous value. The example was just an example; much subtler sequence biases exist in unbiased (by frequency) rngs – rici Mar 02 '16 at 21:27
1

As others have pointed out, rand is not a reliable source of randomness. It's right there in the man page:

NAME
     rand, rand_r, srand, sranddev -- bad random number generator

...

DESCRIPTION
     These interfaces are obsoleted by arc4random(3).

For good randomness you'll have to go outside the standard C libraries.

Note that if you're on a Mac it will complain that RAND_bytes() is deprecated. Don't worry, OpenSSL isn't going anywhere and is fine to use. The deprecation has to do with binary compatibility issues when upgrading Apple products.

Community
  • 1
  • 1
Schwern
  • 153,029
  • 25
  • 195
  • 336
  • 1
    It's true that many `rand()` implementations are poor, and it's true that a too-short repetition cycle is likely the OP's actual problem. But I want to point out that `RAND_MAX` has nothing to do with an RNG's period. It's possible for any old `rand()` implementation to repeat with a period much, much longer than `RAND_MAX`. – Steve Summit Mar 02 '16 at 20:18
  • @SteveSummit indeed, in my answer MSVC's `RAND_MAX` is `32767` but the sequence repeated more slowly. – Weather Vane Mar 02 '16 at 20:34
  • @SteveSummit You're right! I confused the size of the returned value with the period it might repeat. – Schwern Mar 02 '16 at 20:40
  • 1
    I'd also point out that, in one sense, the right answer here is also the wrong answer. Yes, the OP has an RNG that isn't good enough for his needs, and will need to use a better one. But it's a crying shame that he'll have to change his code to call something other than `rand()`! The whole point of modularity is so you can slip in a better implementation behind the same interface! No, the C Standard doesn't guarantee a high-quality `rand()`, but it certainly doesn't mandate a low-quality one, either! Why can't more libc authors write high-quality RNG's *and call them plain old `rand()`?* – Steve Summit Mar 02 '16 at 20:46
  • @SteveSummit I hear you. Lamenting the choices of language standards and implementations and their tendency to favor bad, even dangerous, but compatible, behavior is a full time job. ;) – Schwern Mar 02 '16 at 22:23
  • @Schwern: Thanks, and I'm a sucker for backwards compatibility, too, but what makes this particular lament all the more poignant is of course that a mo' betta `rand()` would be 100% absotively, posilutely compatible! (Though I have heard, at least once, a sober argument put forward with a straight face that a certain bad `rand()` implementation had to be retained because Existing Code Depended On It.) – Steve Summit Mar 02 '16 at 22:37