7

I am playing around with a pair of algorithms I found on other SO posts (listed in the references below), and am trying to find out how to improve the distribution. I am effectively extending the range of a random number by doubling the number of bits, and want to ensure that the distribution is as uniform as possible, while removing (or at least reducing) the effects of modulo bias and other artifacts for a shuffling algorithm that would be using the result of my modified random number generator.

So, it is to my understanding that if I initialize my RNG with a constant seed (ie: srand(1)) I will get the same pattern of deterministic outputs from calling rand() in a for loop. Now, if I were to initialize my seed via srand(time(NULL)), it would be a different pattern, but it still might not help me with the following problem: I am trying to figure out if I were to implement the following algorithm:

  • Take two random numbers a,b
  • Calculate a*(RAND_MAX+1)+b

Would I be able to:

  1. Generate every possible coordinate pair (a,b), where a,b ∈ Z+ on [0, RAND_MAX] (a and b are positive integers between zero and RAND_MAX inclusive).
  2. Maximize the uniformity of the entire distribution (ie: an optimally flat histogram).

While the output of rand() is supposed to be uniformly distributed, I don't know if it's guaranteed to give me values for N,N+1 calls to rand each loop and give me every pair listing in point (1) before the random sequence repeats itself again. My new random number generator could theoretically generate random values on [0, RAND_MAX ^ 2], but I don't know if there might be "holes" aka values in this range that can never be generated by my algorithm.

I attempted to get further on this question myself, but I couldn't find information on how long a random sequence generated be rand() in C goes until it repeats itself. Lacking this and other information, I couldn't figure out whether or not it is possible to generate every pair (a,b).

So, using rand(), is it possible to achieve point (1) and if so, are there any solid suggestions on how to optimize its "randomness" according to point (2)?

Thank you for your time and assistance.

Update

I later revisited this problem and simulated it using an 8-bit PRNG. While it could indeed generate every possible coordinate pair, the distribution was actually quite interesting, and definitely not uniform. In the end, I read several articles/papers on PRNGs and used a Mersenne Twiser algorithm to generate the additional bits needed (i.e. MT19937-64).

References


  1. Extend rand() max range, Accessed 2014-05-07, <https://stackoverflow.com/questions/9775313/extend-rand-max-range>
  2. Shuffle array in C, Accessed 2014-05-07, <https://stackoverflow.com/questions/6127503/shuffle-array-in-c>
Community
  • 1
  • 1
Cloud
  • 18,753
  • 15
  • 79
  • 153
  • While I generally appreciate edits, I prefer the references not be removed, as I like them to remain in IEEE format so that if the question is copy/pasted verbatim, the credits are in human readable format. Also, I think they should be present whenever possible. – Cloud May 07 '14 at 17:04
  • `Information on how long a random sequence generated` -> depends on the RNG. For the mersenne twister, [the period is 2^19937-1](http://www.cs.hmc.edu/~geoff/mtwist.html). The default implementation looks like it is an LCG, which certainly isn't as good. – Anirudh Ramanathan May 07 '14 at 17:06
  • @indiv What I eventually plan to do is to take a large image (ie: 4096x4096 pixels), and shuffle the pixels to different random coordinates. The first step would be to convert the `n x m` image into a row of `nm` pixels, then use the shuffle algorithm on it, then convert it back to an `n x m` image. However, the range of `rand()` is not great enough, and I want to extend it. I am primarily concerned that I might have "holes" in my distribution. – Cloud May 07 '14 at 17:10
  • 1
    Do you have to use `rand()`? Its properties are implementation-dependent and, as you've noticed, not guaranteed to be very good. – David Eisenstat May 07 '14 at 17:17
  • @DavidEisenstat I have to attempt to use something part of standard C, and STL and Boost are not permitted. – Cloud May 07 '14 at 17:35
  • So paste in a public domain RNG with attribution. They're out there. – David Eisenstat May 07 '14 at 17:37
  • @DavidEisenstat I would rather implement one from scratch for research purposes. If I even want to use it for independent projects in the future, I don't want the entire code base polluted by a propagating license. – Cloud May 07 '14 at 17:46
  • So read a description of one, have your lawyer check for patents, and implement it yourself. The problem with trying to extend `rand` is that it comes with very little in the way of a useful guarantee, so much so that any universally valid answer to this question will be itself a good-quality RNG. – David Eisenstat May 07 '14 at 17:58
  • @DavidEisenstat I think the focus of the original question is being missed here: Will this specific implementation work? – Cloud May 07 '14 at 18:16
  • It might or it might not, depending on how good your `rand` is. – David Eisenstat May 07 '14 at 18:24
  • 2
    The discussion about the links belong on Meta, so I posted a question (http://meta.stackoverflow.com/questions/253976/links-at-the-end-like-in-a-research-paper) there. (My personal opinion can perhaps be deduced from how I phrased that question.) – Thomas Padron-McCarthy May 08 '14 at 07:09
  • Problem: `RAND_MAX + 1` may overflow. – chux - Reinstate Monica Jul 01 '15 at 16:08
  • 1
    "My new random number generator ... generate random values on [0, RAND_MAX ^ 2]" is not correct. It may generate on the interval `[0, ((RAND_MAX + 1) ^ 2 - 1)]. – chux - Reinstate Monica Jul 01 '15 at 16:11

1 Answers1

1

Assumptions

As pointed out in the comments, the behaviour of rand() is implementation dependent. So, let's make a few simplifying assumptions to get to the point of the question:

  • rand() can generate all values from 0 to RAND_MAX. Justfication: If it could not, then it would be even harder to generate all possible pairs (a, b).
  • rand() generates a statistically random sequence. Justification: The result of composing two random functions (or the same one twice) is only as good as the base random function.

Of course, we shouldn't expect the result to be better than the building blocks, so any deficiencies in the rand() implementation will reflect itself in any functions composed from it.

Holes in the distribution

Seeding rand() generates a deterministic sequence for a given seed, as the seed determines the PRNG's initial state. The sequence's maximum period is 2N where N is the number of bits in the state. Note that the state may in fact have more bits than RAND_MAX, we'll assume RAND_MAX = (2N - 1) for this section. Because it is a sequence, generating two successive "random" N-bit values a and b means that ab. Therefore, the method a*(RAND_MAX+ 1)+b will have some holes.

A little explanation on ab: PRNGs work by maintaining an internal state of N bits. It uses that state uniquely to determine its next state, so once the same state recurs, the sequence starts to repeat itself. The number of states gone through before the sequence starts repeating itself is called the period. So, technically we could have a = b, but that implies a period of 1, and that's a very bad PRNG. For more information, a helpful answer on PRNG periods has been posted on the Software Engineering site.

An algorithm without holes

One way to allow successive "random" calls to be equal is to generate 2 N-bit numbers, but consider only a certain number of bits significant, i.e. discard some. Now, we can have a = b, though with very slightly less probability than another random number c. Note this is similar to how Java's random number generator works. It is seeded with 48 bits, but outputs a 32-bit random number, discarding 16 bits (assuming # of bits in seed = # of bits in state).

However, since you need values larger than RAND_MAX, what you could do is use the above method, and then concatenate the bits, until you get enough bits to reach the desired maximum (though again, the distribution is not quite uniform).

Community
  • 1
  • 1
DPenner1
  • 10,037
  • 5
  • 31
  • 46
  • The problem with your answer is that, if I'm a [BOFH](http://en.wikipedia.org/wiki/Bastard_Operator_From_Hell) turned library implementer, I can easily design a `rand()` that meets your criteria (read narrowly) but is obviously nonrandom when used the way that you describe. [RANDU](http://en.wikipedia.org/wiki/RANDU) aside, it's unlikely that the standard implementation is this poor, but why take a chance if portability is a goal? – David Eisenstat May 07 '14 at 19:08
  • @DavidEisenstat I've edited my assumptions to address your first concern. To address your second concern: if portability was a goal, then you wouldn't be using `rand()` in the first place because it is implementation dependent, thus by definition not portable. – DPenner1 May 07 '14 at 20:03
  • Thank you @DPenner, this addresses my concerns, and answers the concern with respect to holes. For my final implementation, I will likely find a royalty-free implementation, such as the Mersenne Twister (I *think* it is royalty free). I am curious about a ≠ b. Shouldn't a PRNG allow for consecutive repeated numbers in the sequence? I think the biggest flaw in my original approach was there is no guarantee that each (a,b) pair exists in my sequence. – Cloud May 07 '14 at 20:15
  • I've added an explanation about a ≠ b. Note that it only holds under the assumption RAND_MAX = (2^N - 1), if you output less bits than your internal state maintains, you can have sequential repeated numbers. – DPenner1 May 07 '14 at 22:14
  • Nitpick: I think you should differentiate more clearly between the seed, as given to for example srand, and the internal state. They don't need to have the same number of bits. – Thomas Padron-McCarthy May 08 '14 at 06:50
  • Any good random number generator will allow for repeating outputs, otherwise it would fail too many tests of randomness. See for example http://coliru.stacked-crooked.com/a/222d021c066b04b7. If your implementation of `rand` doesn't allow this, you shouldn't be using it. – Mark Ransom Apr 12 '17 at 18:40