46

Background: I use rand(), std::rand(), std::random_shuffle() and other functions in my code for scientific calculations. To be able to reproduce my results, I always explicitly specify the random seed, and set it via srand(). That worked fine until recently, when I figured out that libxml2 would also call srand() lazily on its first usage - which was after my early srand() call.

I filled in a bug report to libxml2 about its srand() call, but I got the answer:

Initialize libxml2 first then. That's a perfectly legal call to be made from a library. You should not expect that nobody else calls srand(), and the man page nowhere states that using srand() multiple time should be avoided

This is actually my question now. If the general policy is that every lib can/should/will call srand(), and I can/might also call it here and there, I don't really see how that can be useful at all. Or how is rand() useful then?

That is why I thought, the general (unwritten) policy is that no lib should ever call srand() and the application should call it only once in the beginning. (Not taking multi-threading into account. I guess in that case, you anyway should use something different.)

I also tried to research a bit which other libraries actually call srand(), but I didn't find any. Are there any?

My current workaround is this ugly code:

{
    // On the first call to xmlDictCreate,
    // libxml2 will initialize some internal randomize system,
    // which calls srand(time(NULL)).
    // So, do that first call here now, so that we can use our
    // own random seed.
    xmlDictPtr p = xmlDictCreate();
    xmlDictFree(p);
}

srand(my_own_seed);

Probably the only clean solution would be to not use that at all and only to use my own random generator (maybe via C++11 <random>). But that is not really the question. The question is, who should call srand(), and if everyone does it, how is rand() useful then?

Albert
  • 65,406
  • 61
  • 242
  • 386
  • 6
    if you can use C++11 libs, you should have a look at http://en.cppreference.com/w/cpp/numeric/random – Creris Oct 10 '14 at 07:36
  • 3
    Using rand for scientific calculations isn't necessarily a good idea anyway: the underlying implementation is frequently a simple LCG, which generally just doesn't deliver values of sufficient quality for scientific applications. – slyfox Oct 10 '14 at 07:42
  • 10
    IMO it's a design flaw on your side. When you need predictable random values (a *contradictio in adiecto* by itself), you should never use a global random generator that every other part of your program, including the libs you use, might also use. Even if those other parts do not call `srand()`, just by calling `rand()` they will also influence the global random generator. – Erich Kitzmueller Oct 10 '14 at 07:44
  • *"the general (unwritten) policy is that no lib should ever call srand()"* - that's the sensible approach, but then some library authors would get lots of complaints and queries with people wondering why "random" behaviours repeated between runs.... Perhaps the best thing would be to have a `srand_if_not_yet_initialised()` function in the API, but too late for that. Prefer the C++11 versions. – Tony Delroy Oct 10 '14 at 07:44
  • 1
    There is little you can do to control the global function calls, essentially all code using them will "interfere" with each other. To isolate yourself (or another piece of code), use the C++11 features instead. – Niall Oct 10 '14 at 07:45
  • 12
    Sorry, using srand/rand for security reasons, as the libxml2 guy claims, is ridicules. – user515430 Oct 10 '14 at 07:46
  • 2
    Note also that even if you could prevent anyone else from calling srand(), it's still a bad idea to rely on the sequence generated by rand() for testing/validation/etc, as it may change if/when you switch to another platform or even a newer version of the libraries/OS. – Paul R Oct 10 '14 at 07:46
  • 1
    @ammoQ It tends to not be that you want _predictable_ random numbers, but more that when you get a result, you want them to be _repeatable_. While things should of course work for the general case, it allows you to show that it does work for certain inputs if needed (working input configuration), but also means that you can trace how changes effect the program more easily because you are using the same input data. – Baldrickk Oct 10 '14 at 08:00
  • 11
    A library which modifies global state (and this include calling `srand()`) is broken. Their answer is just waffling, trying to justify the unjustifiable. For the rest: if you need reproducible random numbers for scientific calculations, `rand()` doesn't fill the bill. It's fine for games, or just playing around, but that's about it. Otherwise, in pre-C++11, you implement your own, and in C++11, you use ``. (But that doesn't let the library implementers off the hook.) – James Kanze Oct 10 '14 at 08:32
  • 2
    @TonyD If a library implementer needs random numbers, they could require that `srand()` be called before initializing the library. Or simply use their own, private random number generator (in which case, they should provide a means of specifying the seed, so that you can test code which uses the library). – James Kanze Oct 10 '14 at 08:34
  • 19
    @JamesKanze "A library which modifies global state (and this include calling srand()) is broken." -- Calling rand modifies global state just as srand does. What is broken is that these functions *have* global state. – Jim Balter Oct 10 '14 at 08:43
  • 3
    Why the heck does does libxml need randomness at all? – user541686 Oct 10 '14 at 09:20
  • @user515430 Security? He is using it make his results reproducible. It is a very good idea when trying to debug algorithms with a stochastic component. – DuncanACoulter Oct 10 '14 at 09:33
  • I agree that the application should set the seed and leave it at that. This is more moral support than an answer hence its a comment. – DuncanACoulter Oct 10 '14 at 09:36
  • 1
    @JimBalter Yup. A library which needs random numbers should either have its own RNG (one of those in `` is fine if the library doesn't need to support pre-C++11), or ask the user to provide one (like `std::random_shuffle`). – James Kanze Oct 10 '14 at 10:12
  • 1
    @Mehrdad I can think of a few reasons, but for none of them would `rand()` be appropriate. (Most would require some sort of cryptographically secure RNG. Which means that neither `rand()` nor anything in `` would be appropriate.) – James Kanze Oct 10 '14 at 10:14
  • 1
    While it's easy to bag out `rand` and appropriate to recommend new C++11 alternatives, a point that seems to be being overlooked here is that in *many* single-threaded applications, after calling `srand()` with the same seed, and with the same (if any) external inputs, the execution both in app *and* library code will be entirely deterministic. Having a library call `rand()` changes the global state, but no less predictably that if the higher level app code had done so; if the calls are interspersed in the same way, no harm done. – Tony Delroy Oct 10 '14 at 10:15
  • 1
    @JamesKanze: Well that's my question: why might XML parsing require a cryptographically secure RNG (never mind that `rand` isn't one)? – user541686 Oct 10 '14 at 10:25
  • 1
    @user515430: `rand()` is probably used to protect against algorithmic complexity attacks, not for any serious cryptography. – ninjalj Oct 10 '14 at 10:27
  • 1
    In addition to the other answers and comments, IMO you should decouple your source of pseudo-random numbers from the rest of the algorithm. Create a class RandomValues that you can switch between deterministic (maybe even hardcoded values) and any random number generator you wish, and access that class from your algorithm. That would allow you to painlessly switch between deterministic and random inputs. – Michael Oct 10 '14 at 16:34

4 Answers4

34

Use the new <random> header instead. It allows for multiple engine instances, using different algorithms and more importantly for you, independent seeds.

[edit] To answer the "useful" part, rand generates random numbers. That's what it's good for. If you need fine-grained control, including reproducibility, you should not only have a known seed but a known algorithm. srand at best gives you a fixed seed, so that's not a complete solution anyway.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Yea, sure, I know that. But that is not really my question. – Albert Oct 10 '14 at 07:38
  • If you consider the consequences, you see why the library can't assume that you called `srand()` - that initializes the global PRNG. You might not have any reason to do so because you're using your own PRNG. Thus they have to call it. – MSalters Oct 10 '14 at 07:41
  • 2
    @Albert: the answer to your literal question is "it isn't useful, at least not from a testability/repeatability perspective". The only way round that is to encapsulate your own RNG's state; the new C++ library does that for you. – Oliver Charlesworth Oct 10 '14 at 07:43
  • @MSalters: That is my question. Should the lib not assume so? If the lib does not assume that, it means that all tools which call `srand()` after libxml2 init will reset the random generator used by the lib. So, in both cases, it doesn't really make sense for the lib to call `srand()`. Or is there some flaw in my thinking? – Albert Oct 10 '14 at 07:46
  • @Albert: Which does not matter in the least for all those applications which use `rand()` only to get random numbers. They only need to know that `srand()` is called **at least** once. – MSalters Oct 10 '14 at 07:47
  • 1
    Re:their response, since `srand` is neither thread-safe nor reentrant having it called multiple times *from different threads* is not safe at all (`rand` has similar problems). There is a real, fundamental problem here which is addressed with the possibility for multiple generators in C++11. – Benjamin Bannier Oct 10 '14 at 07:48
  • 3
    @Albert Your question is much like "Shouldn't there be world peace?" ... regardless of the answer, there isn't. That libxml2 calls `srand` is a fact. Just move on and do as this answer suggests. – Jim Balter Oct 10 '14 at 07:51
  • My somewhat implicit question, whether this is a bug/problem in libxml2, is still not really answered. Or is the answer that it doesn't really matter that much for libxml2? (And in my app, I should anyway use something different - but that would have been my consequence anyway.) – Albert Oct 10 '14 at 07:53
  • 1
    @Albert It's a pointless philosophical exercise. The maintainer isn't going to change it just because of a discussion here. It's certainly not a bug, but it seems to be a problem for you ... so again, do something else. – Jim Balter Oct 10 '14 at 07:58
  • 1
    @MSalters "does not matter in the least... only need to know that `srand()` is called *at least* once." - it *can* matter: the period guarantees have been lost/reset, so earlier sequences may be repeated much earlier than otherwise (not only - but most dramatically - when several of the calls to `srand()` use say `time(NULL)` as a seed). – Tony Delroy Oct 10 '14 at 10:27
  • @TonyD: Period guarantees? I don't think the standard makes those. Also, it doesn't matter that libXML gets the same random values as you do because they too called `srand()` with the same seed. – MSalters Oct 10 '14 at 13:16
  • 2
    @MSalters: many implementations document a period of 2^32, and it's reasonable for code to rely on implementation-defined behaviours if that suits its portability objectives. "Also, it doesn't matter that libXML gets the same random values as you do because they too called srand() with the same seed." - that's misunderstanding how `rand()` works - it's not going to give libXML the same sequence as the app because they share state, the problem is that if say the app is the serious consumer of random numbers then libXML's `srand()` call can restart a [sub]sequence the app already generated. – Tony Delroy Oct 10 '14 at 13:31
  • So? Let A be the part of the repeating pattern generated between the two calls of `srand` and B the remainder, then the resulting sequence will be `AABABABA` etcetera, which is more random then `ABABABA`. – MSalters Oct 10 '14 at 15:34
  • @MSalters If you seriously believe ignoring the lessened period and reasoning that way makes sense, then I guess I'd best call it quits... ;-P. – Tony Delroy Oct 10 '14 at 16:34
  • @TonyD: The sequence AABABAB... isn't even periodic! Regardless, given an sub-sequence of just A, only the latter lets you predict with 100% certainty that the output will continue with B. Zero entropy. – MSalters Oct 10 '14 at 18:04
  • "that's misunderstanding how rand() works" -- Uh, no, you seem to have completely misunderstood what he wrote. " it's not going to give libXML the same sequence as the app because they share state" -- he didn't say anything about "because they share state", he said if srand is called with the same seed. – Jim Balter Oct 10 '14 at 18:12
  • 1
    @JimBalter: I'm confident I haven't misunderstood anything here. As MSalter's wrote in his last comment "The sequence AABABAB... isn't even periodic!" - that's the very problem I've been highlighting. Whether it was the app or lib that first called `srand()` and consumed `A`, it might reasonably expect `A` not to be repeated so soon, but the second call to `srand()` invalidates that expectation and A's seen twice (minus any elements the other lib/app consumes by itself calling `rand()`). *I* mentioned "because they share state" because subsequences from A may be (re)seen by the app and lib. – Tony Delroy Oct 20 '14 at 10:52
  • I'm sure such confidence is comforting. – Jim Balter Oct 20 '14 at 17:46
26

Well, the obvious thing has been stated a few times by others, use the new C++11 generators. I'm restating it for a different reason, though.
You use the output for scientific calculations, and rand usually implements a rather poor generator (in the mean time, many mainstream implementations use MT19937 which apart from bad state recovery isn't so bad, but you have no guarantee for a particular algorithm, and at least one mainstream compiler still uses a really poor LCG).

Don't do scientific calculations with a poor generator. It doesn't really matter if you have things like hyperplanes in your random numbers if you do some silly game shooting little birds on your mobile phone, but it matters big time for scientific simulations. Don't ever use a bad generator. Don't.

Important note: std::random_shuffle (the version with two parameters) may actually call rand, which is a pitfall to be aware of if you're using that one, even if you otherwise use the new C++11 generators found in <random>.

About the actual issue, calling srand twice (or even more often) is no problem. You can in principle call it as often as you want, all it does is change the seed, and consequentially the pseudorandom sequence that follows. I'm wondering why an XML library would want to call it at all, but they're right in their response, it is not illegitimate for them to do it. But it also doesn't matter.
The only important thing to make sure is that either you don't care about getting any particular pseudorandom sequence (that is, any sequence will do, you're not interested in reproducing an exact sequence), or you are the last one to call srand, which will override any prior calls.

That said, implementing your own generator with good statistical properties and a sufficiently long period in 3-5 lines of code isn't all that hard either, with a little care. The main advantage (apart from speed) is that you control exactly where your state is and who modifies it.
It is unlikely that you will ever need periods much longer than 2128 because of the sheer forbidding time to actually consume that many numbers. A 3GHz computer consuming one number every cycle will run for 1021 years on a 2128 period, so there's not much of an issue for humans with average lifespans. Even assuming that the supercomputer you run your simulation on is a trillion times faster, your grand-grand-grand children won't live to see the end of the period.
Insofar, periods like 219937 which current "state of the art" generators deliver are really ridiculous, that's trying to improve the generator at the wrong end if you ask me (it's better to make sure they're statistically firm and that they recover quickly from a worst-case state, etc.). But of course, opinions may differ here.

This site lists a couple of fast generators with implementations. They're xorshift generators combined with an addition or multiplication step and a small (from 2 to 64 machine words) lag, which results in both fast and high quality generators (there's a test suite as well, and the site's author wrote a couple of papers on the subject, too). I'm using a modification of one of these (the 2-word 128-bit version ported to 64-bits, with shift triples modified accordingly) myself.

Niall
  • 30,036
  • 10
  • 99
  • 142
Damon
  • 67,688
  • 20
  • 135
  • 185
8

This problem is being tackled in C++11's random number generation, i.e. you can create an instance of a class:

std::default_random_engine e1

which allows you to fully control only random numbers generated from object e1 (as opposed to whatever would be used in libxml). The general rule of thumb would then be to use new construct, as you can generate your random numbers independently.

Very good documentation

To address your concerns - I also think that it would be a bad practice to call srand() in a library like libxml. However, it's more that srand() and rand() are not designed to be used in the context you are trying to use them - they are enough when you just need some random numbers, as libxml does. However, when you need reproducibility and be sure that you are independent on others, the new <random> header is the way to go for you. So, to sum up, I don't think it's a good practice on library's side, but it's hard to blame them for doing that. Also, I could not imagine them changing that, as billion other pieces of software probably depend on it.

Niall
  • 30,036
  • 10
  • 99
  • 142
wasylszujski
  • 146
  • 3
7

The real answer here is that if you want to be sure that YOUR random number sequence isn't being altered by someone else's code, you need a random number context that is private to YOUR work. Note that calling srand is only one small part of this. For example, if you call some function in some other library that calls rand, it will also disrupt the sequence of YOUR random numbers.

In other words, if you want predictable behaviour from your code, based on random number generation, it needs to be completely separate from any other code that uses random numbers.

Others have suggested using the C++ 11 random number generation, which is one solution.

On Linux and other compatible libraries, you could also use rand_r, which takes a pointer to an unsigned int to a seed that is used for that sequence. So if you initialize that a seed variable, then use that with all calls to rand_r, it will be producing a unique sequence for YOUR code. This is of course still the same old rand generator, just a separate seed. The main reason I meantion this is that you could fairly easily do something like this:

int myrand()
{
   static unsigned int myseed = ... some initialization of your choice ...;
   return rand_r(&myseed);
}

and simply call myrand instead of std::rand (and should be doable to work into the std::random_shuffle that takes a random generator parameter)

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • 2
    Yes! So right. Just moving the `srand()` call out of the *libxml* library won't really make your results very reproducible. It might work somewhat, but obviously some libxml functions are calling `rand()` at some point (calculating UUIDs?) and this will alter the psuedorandom sequence your program will receive from `rand()` after that... – Colin D Bennett Oct 10 '14 at 17:38