3

I'm using srand (time(NULL)); to generate a random seed.

Problem is, I'm submitting 30+ identical jobs to a LINUX cluster. If I submit them one at a time, everything is fine, but of course I prefer to use a batch job to submit all 30 at once. Much easier and quicker. Problem is, then several batches of the jobs all appear to access exactly the same time, and I get duplicate results! Can anyone suggest an easy solution to this?

jww
  • 97,681
  • 90
  • 411
  • 885
  • 1
    They are probably getting the same time stamp. Is there some environment variable you can use to add to the initial seed? Such as process ID or something? – Neil Kirk Oct 04 '14 at 22:38
  • A good watch: [rand() Considered Harmful](http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful) – polarysekt Oct 04 '14 at 22:40
  • This question appears to be off-topic because [`rand()` is considered harmful](http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful). – Griwes Oct 04 '14 at 22:44
  • 3
    @Griwes Why does that make it off-topic? It's a programming question, isn't it? – Neil Kirk Oct 04 '14 at 22:45
  • possible duplicate of [Recommended way to initialize srand?](http://stackoverflow.com/questions/322938/recommended-way-to-initialize-srand) – Raymond Chen Oct 05 '14 at 01:24

2 Answers2

2

Consider reading from /dev/random or /dev/urandom. They have higher quality randomness than rand() (which is usually just a simple linear congruential generator), and /dev/random blocks until sufficient entropy has built up.

yossarian
  • 1,537
  • 14
  • 21
  • Or seed the generator using them... – jww Oct 04 '14 at 22:46
  • It doesn't make any sense to seed `rand()` with either of the random devices. `rand()` is designed to rapidly produce numbers that *appear* random. Seeding it with a random device would only limit entropy. – yossarian Oct 04 '14 at 22:50
  • it would avoid his problem with duplicate generator state in batch jobs. I'm not sure how seeding `rand()` with `/dev/{u}/random` would limit entropy. Could you explain it? – jww Oct 04 '14 at 22:53
  • Sure, but it just adds another layer of complexity. Why call `rand()` at all when you already have a good source of random numbers? Seeding `rand()` with either of the devices would limit entropy because `rand()`'s output is easy to predict, even with a good input. That's just the nature of its implementation. – yossarian Oct 04 '14 at 22:55
  • Correction: I might be wrong. I don't know about `rand()`'s implementation details on Linux. On other Unix(y) systems, `rand()` is usually just a simple LCG or Twister whose output becomes predictable after a certain number of iterations. If that isn't the case on Linux, seeding `rand()` with a random device's output *may* not compromise entropy significantly. – yossarian Oct 04 '14 at 23:00
  • Secure output from the generator is not a requirement in the question. OP's requirement is multiple generators that produce different outputs. That can be accomplished by reading from a random device (as you suggested), or seeding `rand()` with different seeds. – jww Oct 04 '14 at 23:01
  • I was just thinking with scale in mind. If he's submitting 30 jobs today, he could be submitting 3000 jobs tomorrow. If the period of `rand()` is 2999, he could compromise his results. My apologies if I dragged this off topic. – yossarian Oct 04 '14 at 23:04
  • GLIBC's `rand` function uses an `int32_t`. Its period is going to be on the order of 2^31 (otherwise, we would have read about the vulnerability). See [`__random`](http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/random.c;hb=glibc-2.15#l292) in the GLIBC sources. – jww Oct 04 '14 at 23:10
0

Look into the new features in <random> in C++11. In particular std::random_device. Otherwise, a cheesy solution is to add pid to time(NULL).

user515430
  • 298
  • 1
  • 3
  • 7