random_shuffle algorithm - are identical results produced without random generator function?

Question

If a random generator function is not supplied to the random_shuffle algorithm in the standard library, will successive runs of the program produce the same random sequence if supplied with the same data?

For example, if

std::random_shuffle(filenames.begin(), filenames.end());

is performed on the same list of filenames from a directory in successive runs of the program, is the random sequence produced the same as that in the prior run?

You know, you could try it and see... queue the talks on random seeding :) — Andrew White, Aug 09 '11 at 16:53
I do get the same results -- just trying to make sure this is the expected behavior. The answer may, of course, also be "yes, this happens but that's an accident because the behavior is undefined." — drb, Aug 09 '11 at 16:58
BTW std::random_shuffle was deprecated in C++14 and removed in C++17. Here is a better way: https://stackoverflow.com/a/16969267/2151446 — Bull, Dec 03 '17 at 02:16

score 7 · Answer 1 · answered Aug 09 '11 at 17:11

If you use the same random generator, with the same seed, and the same starting sequence, the results will be the same. A computer is, after all, deterministic in its behavior (modulo threading issues and a few other odds and ends).

If you do not specify a generator, the default generator is implementation defined. Most implementations, I think, use std::rand() (which can cause problems, particularly when the number of elements in the sequence is larger than RAND_MAX). I would recommend getting a generator with known quality, and using it.

If you don't correctly seed the generator which is being used (another reason to not use the default, since how you seed it will depend on the implementation), then you'll get what you get. In the case of std::rand(), the default always uses the same seed. How you seed depends on the generator used. What you use to seed it should be vary from one run to the other; for many applications, time(NULL) is sufficient; on a Unix platform, I'd recommend reading however many bytes it takes from /dev/random. Otherwise, hashing other information (IP address of the machine, process id, etc.) can also improve things---it means that two users starting the program at exactly the same second will still get different sequences. (But this is really only relevant if you're working in a networked environment.)

I think `std::rand` is sufficient for most programs. Obviously if you're doing poker or cryptography you'll want a better algorithm. — Billy ONeal, Aug 09 '11 at 17:42
@Billy: the value of `RAND_MAX` matters, though. On Windows it's only 32k, so maybe "most programs" only need small random numbers, but you still need to be aware of the limitation, you can't think "I don't need crytographically secure unpredictability, therefore I can just use `rand`". — Steve Jessop, Aug 09 '11 at 17:55
It was, in fact, the value of `RAND_MAX` that I was mainly thinking of (although I've seen implementations of `rand` that were really, really bad). If you have more than 32k elements, of course, using `rand()` will skew things horribly, but since the implementation must map the value to `[0...n)`, unless the number of elements is considerably smaller than `RAND_MAX`, there will be noticeable skew. — James Kanze, Aug 10 '11 at 08:12

Mark B · Accepted Answer · 2011-08-09T17:12:23.523

6

25.2.11 just says that the elements are shuffled with uniform distribution. It makes no guarantees as to which RNG is used behind the scenes (unless you pass one in) so you can't rely on any such behavior.

In order to guarantee the same shuffle outcome you'll need to provide your own RNG that provides those guarantees, but I suspect even then if you update your standard library the random_shuffle algorithm itself could change effects.

edited Aug 09 '11 at 17:12

answered Aug 09 '11 at 17:05

Mark B

95,107
10
109
188

1

+1, although I don't know about "unfortunate". Even if the standard were to say something about the RNG used, I would expect it to permit the implementation to use `rand()`. So at the very least, the previous calls to `srand()` and `rand()` would need to be the same on every run in order for the result to be the same. Also, note that the definition of `random_shuffle` doesn't actually say that it *uses* the third argument ;-) – Steve Jessop Aug 09 '11 at 17:10
@Steve Jessop Good point about that, I'll excise that word as subjective. – Mark B Aug 09 '11 at 17:11
One thing I've noticed is that whatever the default RNG is, it is always seeded with the same value at initialization. Without calling srand() before using std::random_shuffle, the first call to it will always produce the same result on the same container. – Sean Aug 09 '11 at 17:42
@Sean: that's the usual way because `rand()` is required to behave at program start the same as if seeded with `srand(0)`. So any implementation of `random_shuffle` using `rand()` will behave as you observe. A linux implementation *could* read straight from `/dev/random` if it wanted to, or (to be less anti-social) seed a PRNG from `/dev/urandom`, but gcc's implementation that I have on my machine just uses `rand()`. – Steve Jessop Aug 09 '11 at 17:51

score 4 · Answer 3 · 2011-08-09T17:05:55.030

4

You may produce an identical result every run of the program. You can add a custom random number generator (which can be seeded from an external source) as an additional argument to std::random_shuffle if this is a problem. The function would be the third argument. Some people ~~recommend~~ call srand(unsigned(time(NULL))); before random_shuffle, but the results are often times implementation defined (and unreliable).

edited Aug 09 '11 at 17:05

answered Aug 09 '11 at 16:54

People certainly do call `srand(unsigned(time(NULL)));` But I've yet to hear anybody *recommend* it! – TonyK Aug 09 '11 at 17:04
1

Regarding `unsigned(time(NULL))`: tiresome pedantry, but C++ allows `time_t` to be a floating point type, and converting an out-of-range floating value to an integer type is undefined behavior. Silly, because `time_t` is more loosely specified than anyone has any real need for, but there it is. Yet another fact about portability that nobody needs to know in practice :-) – Steve Jessop Aug 09 '11 at 17:22

random_shuffle algorithm - are identical results produced without random generator function?

3 Answers3

Linked

Related