First random number is always smaller than rest

Question

I happen to notice that in C++ the first random number being called with the std rand() method is most of the time significant smaller than the second one. Concerning the Qt implementation the first one is nearly always several magnitudes smaller.

qsrand(QTime::currentTime().msec());
qDebug() << "qt1: " << qrand();
qDebug() << "qt2: " << qrand();

srand((unsigned int) time(0));
std::cout << "std1: " << rand() << std::endl;
std::cout << "std2: " << rand() << std::endl;

output:

qt1:  7109361
qt2:  1375429742
std1: 871649082
std2: 1820164987

Is this intended, due to error in seeding or a bug? Also while the qrand() output varies strongly the first rand() output seems to change linearly with time. Just wonder why.

Because `rand()` is often implemented as a LCG, it's quite normal that if you seed it with seeds that don't vary much in magnitude between runs (time in seconds since the Epoch), that the first outputs after seeding will be of heavily correlated magnitude too. There is a similar problem with other PRNGs called _escape from zeroland_, where for the first few iterations after seeding the state with 0's, the state contains significantly more than 50% of 0's. The solution in many cases is to "warm up" the PRNG (or "escape zeroland"): After seeding, call the PRNG and discard its first few outputs. — Iwillnotexist Idonotexist, May 25 '15 at 02:35

score 15 · Accepted Answer · edited May 23 '17 at 11:45

I'm not sure that could be classified as a bug, but it has an explanation. Let's examine the situation:

Look at rand's implementation. You'll see it's just a calculation using the last generated value.
You're seeding using QTime::currentTime().msec(), which is by nature bounded by the small range of values 0..999, but qsrand accepts an uint variable, on the range 0..4294967295.

By combining those two factors, you have a pattern.

Just out of curiosity: try seeding with QTime::currentTime().msec() + 100000000

Now the first value will probably be bigger than the second most of the time.

I wouldn't worry too much. This "pattern" seems to happen only on the first two generated values. After that, everything seems to go back to normal.

EDIT:

To make things more clear, try running the code below. It'll compare the first two generated values to see which one is smaller, using all possible millisecond values (range: 0..999) as the seed:

int totalCalls, leftIsSmaller = 0;
for (totalCalls = 0; totalCalls < 1000; totalCalls++)
{
    qsrand(totalCalls);
    if (qrand() < qrand())
        leftIsSmaller++;
}
qDebug() << (100.0 * leftIsSmaller) / totalCalls;

It will print 94.8, which means 94.8% of the time the first value will be smaller than the second.

Conclusion: when using the current millisecond to seed, you'll see that pattern for the first two values. I did some tests here and the pattern seems to disappear after the second value is generated. My advice: find a "good" value to call qsrand (which should obviously be called only once, at the beginning of your program). A good value should span the whole range of the uint class. Take a look at this other question for some ideas:

Recommended way to initialize srand?

Also, take a look at this:

PCG: A Family of Better Random Number Generators

The small range of the seed is an important point, probably even more important than the exact pattern at the start. Even if there were no visible patterns, there is a significant risk that two runs of the program will use the exact same seed. — kasperd, May 25 '15 at 08:26

Alexander V · Answer 2 · 2015-05-25T19:02:32.440

12

Neither current Qt nor C standard run-time have a quality randomizer and your test shows. Qt seems to use C run-time for that (this is easy to check but why). If C++ 11 is available in your project, use much better and way more reliable method:

#include <random>
#include <chrono>

auto seed = std::chrono::system_clock::now().time_since_epoch().count();
std::default_random_engine generator(seed);
std::uniform_int_distribution<uint> distribution;
uint randomUint = distribution(generator);

There is good video that covers the topic. As noted by commenter user2357112 we can apply different random engines and then different distributions but for my specific use the above worked really well.

edited May 25 '15 at 19:02

answered May 25 '15 at 01:46

Alexander V

8,351
4
38
47

3

Could you explain what makes this better or more reliable? – user2357112 May 25 '15 at 04:47
@user2357112, I watched this: http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful – Alexander V May 25 '15 at 04:57
11

Could you include the video's reasons in your answer, rather than just linking the video? – user2357112 May 25 '15 at 05:10
Ok, I've added the video to the answer as long as it makes sense. – Alexander V May 25 '15 at 05:32
8

No, I mean including the actual reasons the video gives, not just putting the link in the answer. Aside from the use of a seed that isn't restricted to the range 0-999, I don't see why this snippet is supposed to be better. I don't think the video actually supports your claim that this is better; the video seems to support the use of specific generators like `std::mt19937`, and the guy *specifically discourages* the use of `std::default_random_engine`. See 29:20. – user2357112 May 25 '15 at 07:50
7

Linking to a video like this is often not that useful, since it's possible the link will not be active when future visitors find this post. I would recommend outlining the key points raised in the video and cite it as a reference, rather than just linking. – ninesided May 25 '15 at 10:00
@user2357112, as long as there is no any range specified then the uniform_int_distribution is merely and adapter to uint type which is quite big to care of non-uniform distribution at all. As for std::default_random_engine, it just works for a very seldom ones in app life time moment of generating its unique id and even std::mt19937 is not necessarily better, there in video a concern about cryptographical security or so. – Alexander V May 25 '15 at 19:00

Andrzej Pronobis · Answer 3 · 2015-05-25T20:40:22.090

8

Keeping in mind that making judgments about a statistical phenomena based on a small number of samples might be misleading, I decided to run a small experiment. I run the following code:

int main()
{
  int i = 0;
  int j = 0;
  while (i < RAND_MAX)
  {
    srand(time(NULL));
    int r1 = rand();
    int r2 = rand();
    if (r1 < r2) 
      ++j;
    ++i;
    if (i%10000 == 0) {
      printf("%g\n", (float)j / (float)i);
    }
  }
}

which basically printed the percentage of times the first generated number was smaller than the second. Below you see the plot of that ratio:

enter image description here

and as you can see it actually approaches 0.5 after less than 50 actual new seeds.

As suggested in the comment, we could modify the code to use consecutive seeds every iteration and speed up the convergence:

int main()
{
  int i = 0;
  int j = 0;
  int t = time(NULL);
  while (i < RAND_MAX)
  {
    srand(t);
    int r1 = rand();
    int r2 = rand();
    if (r1 < r2)
      ++j;
    ++i;
    if (i%10000 == 0) {
      printf("%g\n", (float)j / (float)i);
    }
    ++t;
  }
}

This gives us:

enter image description here

which stays pretty close to 0.5 as well.

While rand is certainly not the best pseudo random number generator, the claim that it often generates a smaller number during the first run does not seem to be warranted.

edited May 25 '15 at 20:40

answered May 25 '15 at 02:53

Andrzej Pronobis

33,828
17
76
92

2

Better to compute `time(NULL)` out of the loop, and increment the seed manually in the loop. – Jarod42 May 25 '15 at 07:24
2

This graph seems to show that although the first number is not lower than the second more than half the time, there are extremely long runs where it is, every time, much longer than one would expect from coin-flipping. – jwg May 25 '15 at 07:26
@Jarod42 makes a very good point about the seed of course, and as I said in the answer, we can clearly see when the time() generates a new seed. When we account for that, those periods are not that long. – Andrzej Pronobis May 25 '15 at 07:59
1

@jwg Isn't that a consequence of the program reseeding the random generator with the exact same seed many times in a row? – kasperd May 25 '15 at 08:22
@kasperd in that case, this graph only shows about 30-40 independent samples, with odd weightings, and should be redone with fresh seeds for each observation. – jwg May 25 '15 at 09:41
3

@jwg I agree with that. It is also worth noticing that the seed isn't being chosen uniformly random. Both this answer and the question have skewed the inputs for `srand`, but they are not skewed in the same way. – kasperd May 25 '15 at 09:51
@jwg, that is obviously indeed the case. The answer is certainly not a complete statistical analysis of the generator. The point was to show that drawing a conclusion based on several arbitrarily selected samples is most certainly going to be biased. Finding a counter example to the claim is in such case sufficient, even if not entirely unbiased itself. – Andrzej Pronobis May 25 '15 at 16:18
1

To make things more clear, I included a result for consecutive seeds as well. – Andrzej Pronobis May 25 '15 at 20:41

First random number is always smaller than rest

3 Answers3

EDIT:

Linked