Duplicate output when using srand for random seed

Question

I'm using srand (time(NULL)); to generate a random seed.

Problem is, I'm submitting 30+ identical jobs to a LINUX cluster. If I submit them one at a time, everything is fine, but of course I prefer to use a batch job to submit all 30 at once. Much easier and quicker. Problem is, then several batches of the jobs all appear to access exactly the same time, and I get duplicate results! Can anyone suggest an easy solution to this?

They are probably getting the same time stamp. Is there some environment variable you can use to add to the initial seed? Such as process ID or something? — Neil Kirk, Oct 04 '14 at 22:38
A good watch: [rand() Considered Harmful](http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful) — polarysekt, Oct 04 '14 at 22:40
This question appears to be off-topic because [`rand()` is considered harmful](http://channel9.msdn.com/Events/GoingNative/2013/rand-Considered-Harmful). — Griwes, Oct 04 '14 at 22:44
@Griwes Why does that make it off-topic? It's a programming question, isn't it? — Neil Kirk, Oct 04 '14 at 22:45
possible duplicate of [Recommended way to initialize srand?](http://stackoverflow.com/questions/322938/recommended-way-to-initialize-srand) — Raymond Chen, Oct 05 '14 at 01:24

yossarian · Answer 1 · 2014-10-04T22:53:13.247

2

Consider reading from /dev/random or /dev/urandom. They have higher quality randomness than rand() (which is usually just a simple linear congruential generator), and /dev/random blocks until sufficient entropy has built up.

edited Oct 04 '14 at 22:53

answered Oct 04 '14 at 22:45

yossarian

1,537
14
21

Or seed the generator using them... – jww Oct 04 '14 at 22:46
It doesn't make any sense to seed `rand()` with either of the random devices. `rand()` is designed to rapidly produce numbers that *appear* random. Seeding it with a random device would only limit entropy. – yossarian Oct 04 '14 at 22:50
it would avoid his problem with duplicate generator state in batch jobs. I'm not sure how seeding `rand()` with `/dev/{u}/random` would limit entropy. Could you explain it? – jww Oct 04 '14 at 22:53
Sure, but it just adds another layer of complexity. Why call `rand()` at all when you already have a good source of random numbers? Seeding `rand()` with either of the devices would limit entropy because `rand()`'s output is easy to predict, even with a good input. That's just the nature of its implementation. – yossarian Oct 04 '14 at 22:55
Correction: I might be wrong. I don't know about `rand()`'s implementation details on Linux. On other Unix(y) systems, `rand()` is usually just a simple LCG or Twister whose output becomes predictable after a certain number of iterations. If that isn't the case on Linux, seeding `rand()` with a random device's output *may* not compromise entropy significantly. – yossarian Oct 04 '14 at 23:00
Secure output from the generator is not a requirement in the question. OP's requirement is multiple generators that produce different outputs. That can be accomplished by reading from a random device (as you suggested), or seeding `rand()` with different seeds. – jww Oct 04 '14 at 23:01
I was just thinking with scale in mind. If he's submitting 30 jobs today, he could be submitting 3000 jobs tomorrow. If the period of `rand()` is 2999, he could compromise his results. My apologies if I dragged this off topic. – yossarian Oct 04 '14 at 23:04
GLIBC's `rand` function uses an `int32_t`. Its period is going to be on the order of 2^31 (otherwise, we would have read about the vulnerability). See [`__random`](http://sourceware.org/git/?p=glibc.git;a=blob;f=stdlib/random.c;hb=glibc-2.15#l292) in the GLIBC sources. – jww Oct 04 '14 at 23:10

score 0 · Answer 2 · answered Oct 04 '14 at 22:39

0

Look into the new features in <random> in C++11. In particular std::random_device. Otherwise, a cheesy solution is to add pid to time(NULL).

answered Oct 04 '14 at 22:39

user515430

298
1
3
7

Duplicate output when using srand for random seed

2 Answers2