34

I'm talking about this surprisingly simple implementation of rand() from the C standard:

static unsigned long int next = 1;

int rand(void)  /* RAND_MAX assumed to be 32767. */
{
    next = next * 1103515245 + 12345;
    return (unsigned)(next/65536) % 32768;
}

From this Wikipedia article we know that the multiplier a (in above code a = 1103515245) should fulfill only 2 conditions:

  1. a - 1 is divisible by all prime factors of m.
    (In our case m = 2^32, size of the int, so m has only one prime factor = 2)
  2. a - 1 is a multiple of 4 if m is a multiple of 4.
    (32768 is multiple of 4, and 1103515244 too)

Why they have chosen such a strange, hard-to-remember, "man, I'm fed up with these random numbers, write whatever" number, like 1103515245?

Maybe there are some wise reasons, that this number is somehow better than the other?

For example, why don't set a = 20000000001? It's bigger, cool-looking and easier to remember.

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
  • 6
    @Ed S.: reasonable enough question to ask for a magic number to be explained... – gbn Dec 19 '11 at 23:52
  • :) Of course no, but look at the number 12345. Once they're choosing easy, good-looking number 12345, once bad... without a reason? :) – Adam Stelmaszczyk Dec 19 '11 at 23:52
  • 1
    You might start by looking at the references, the answers are probably there somewhere: http://en.wikipedia.org/wiki/Linear_congruential_generator#References – Mark Ransom Dec 19 '11 at 23:52
  • 1
    This is quite interesting. I don't have a solution, but perhaps it has something to do with `65536` being the largest int that can be stored on two bytes. – Edwin Dec 19 '11 at 23:53
  • @Mark Ransom: several of those require subscriptions or logins... – gbn Dec 19 '11 at 23:54
  • 6
    @Edwin: usually `65536` is the smallest int that cannot be stored on 2 bytes ;) – pmg Dec 20 '11 at 00:01
  • @pmg Damn; I couldn't figure out if it was `65536` or `65535`. Guess I should've thought a little longer. – Edwin Dec 20 '11 at 00:35
  • The period here is 2^32, not 2^16. It is only afterwards that the high order bytes are kept (which are the "most random" ones). – Alexandre C. Dec 20 '11 at 11:27
  • @Edwin: The giveaway is that 65536 is even and 65535 is odd... ;) – Martin B Dec 20 '11 at 11:30
  • @AlexandreC. Yes, now I see that's not 2^16, I've edited question. Now it looks to me that m = 2^32, since the int has 32 bits. But in [very good paper you've posted](http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=4F8F207F726CB56B4630F8155F748256?doi=10.1.1.53.3686&rep=rep1&type=pdf) they say it's 2^31... – Adam Stelmaszczyk Dec 20 '11 at 15:07

4 Answers4

44

If you use a LCG to draw points on the d dimensional space, they will lie on at most (d!m)1/d hyperplanes. This is a known defect of LCGs.

If you don't carefully choose a and m (beyond the condition for full periodicity), they may lie on much fewer planes than that. Those numbers have been selected by what is called the spectral test.

The "spectral test" (the name comes from number theory) is the maximum distance between consecutive hyperplanes on which d-dimensional joint distributions lie. You want it to be as small as possible for as many d as you can test.

See this paper for a historical review on the topic. Note that the generator you quote is mentioned in the paper (as ANSIC) and determined to not be very good. The high order 16 bits are acceptable, however, but many applications will need more than 32768 distinct values (as you point out in the comments, the period is indeed 2^31 -- the conditions for full periodicity in Wikipedia's link are probably only necessary).

The original source code in the ANSI document did not take the high order 16 bits, yielding a very poor generator which is easy to misuse (rand() % n is what people first think of to draw a number between 0 and n, and this yields something very non-random in this case).

See also the discussion on LCGs in Numerical Recipes. Quoting:

Even worse, many early generators happened to make particularly bad choices for m and a. One infamous such routine, RANDU, with a = 65539 and m = 231, was widespread on IBM mainframe computers for many years, and widely copied onto other systems. One of us recalls as a graduate student producing a “random” plot with only 11 planes and being told by his computer center’s programming consultant that he had misused the random number generator: “We guarantee that each number is random individually, but we don’t guarantee that more than one of them is random.” That set back our graduate education by at least a year!

caesay
  • 16,932
  • 15
  • 95
  • 160
Alexandre C.
  • 55,948
  • 11
  • 128
  • 197
7

Remember that rand() is an approximation of a uniform distribution. Those numbers are used because they have been tested to show that they generate a more uniform-looking distribution.

Given the multitude of pairs of unsigned integers in the representable range, I doubt anyone has tried them all with all valid seeds. If you think you have a better choice of parameters, just try it out! You have the code, just factor out the parameters of the LCG and run tests. Generate a bunch of numbers (say 10 million), compute a histogram of the generated numbers and plot that to look at the distribution.

edit If you are interested in developing a pseudo-random number generator for use in real applications, I recommend that you read up on the considerable literature on the subject. The "advice" given above is only suggested to help show that choosing arbitrary "bigger, cool-looking and easier to remember" LCG parameters will give a very poor distribution. /edit

Besides, it's a library function and I've never seen a program using the standard library version of rand() to remember its LCG's parameters.

André Caron
  • 44,541
  • 12
  • 67
  • 125
  • 3
    You have to know what you are looking for when trying out the parameters, especially with respect to the joint distributions of consecutive numbers (which is terrible for many LCG parameters, and less terrible for a few ones). There is an extensive litterature on this. – Alexandre C. Dec 20 '11 at 11:46
  • @DonalFellows: I don't recommend anyone use such a simple approach in the development of PRNGs, and I don't think that's what OP wanted. Hell, I would not event recommend using an LCG in the first place. However, this answer explains clearly enough why C's `rand()` uses "hard to remember" LCG parameters instead of "bigger, cool-looking and easier to remember" parameters. – André Caron Dec 20 '11 at 18:50
  • 1
    Generally speaking, there are three classes of PRNG: simple ones (such as `rand()`), scientific ones (with very good spectral properties) and cryptographic ones (where each bit is necessarily as hard to predict as possible). There's a large literature on this — there's been a lot of research, really — and it's important to only use good ones because it's so easy to go horribly wrong. – Donal Fellows Dec 20 '11 at 19:03
  • I'm sorry but I still fail to see the reasons behind the down votes. I would not have answered as is if OP was asking for real advice on how to develop random number generators. This is a simple answer to a simple question. In any case, I added a note mentioning not to use this for development of custom PRNGs. – André Caron Dec 21 '11 at 16:06
2

Early computations tended to concern themselves with the bits and bytes and played tricks with the registers to minimize bytes of code (before lines there were bytes)

I have only found one reasonable clue below:

The output of this generator isn’t very random. If we use the sample generator listed above, then the sequence of 16 key bytes will be highly non-random. For instance, it turns out that the low bit of each successive output of rand() will alternate (e.g., 0,1,0,1,0,1, . . . ). Do you see why? The low bit of x * 1103515245 is the same as the low bit of x, and then adding 12345 just flips the low bit. Thus the low bit alternates. This narrows down the set of possible keys to only 2113 possibilities;much less than the desired value of 2128.

http://inst.eecs.berkeley.edu/~cs161/fa08/Notes/random.pdf

And two reasonable answers:

Improving a poor random number generator (1976) by Bays, Durham Bays, Carter, S D Durham

http://en.wikipedia.org/wiki/TRNG

Dave
  • 21
  • 2
0

That number seems special, it's just between two primes :P.

Now talking seriously, to see if it's a good choice, just look at the output. You should see very different results even if flipping a single bit.

Also, consider how much predictability you expect... that implementation is terrible, you may consider a more robust yet simple alternative, like FNV-1a.

Shipof123
  • 233
  • 2
  • 11
Ismael Luceno
  • 2,055
  • 15
  • 26
  • Well, I would like to contest that notion, how would you define a PRNG? – Ismael Luceno Nov 30 '13 at 18:09
  • PRNGs are designed for that purpose. A hash algorithm merely needs to be a one-way function, if you loop it, you may get a rather poor source of random numbers. A hash algorithm does not necessarily come specified with a way to loop it up for PRNG use. – Kuba hasn't forgotten Monica Dec 03 '13 at 01:37
  • @KubaOber so... where is your definition? – Ismael Luceno Sep 21 '16 at 03:10
  • A hash function is a function h : {0,1}^* -> {0,1}^k for some fixed output length k and arbitrary input length. A PRNG is a function f : {0,1}^s -> {0,1}^s x {0,1}^k for some fixed seed length s and output length k. You could use a hash function to implement a PRNG but you haven't specified your construction or given any indication of why this is a good thing to do (it probably isn't, but then, neither is using a LCG). – TH. Aug 11 '17 at 05:32