2

I have written a simple BASIC interpreter that I'm running on Clang/macOS. To implement RND(), desiring to use as standard C as possible, I used srand/rand. Near the start of the program (in retrobasic.c) I call srand (yes, once) with either null or a user-passed value:

if (random_seed > -1)
  srand(random_seed);
else
  srand(time(NULL));

There is only one place that a number is produced:

result.number = ((double)rand() / (double)RAND_MAX); 

The code that's calling this is INT(RND()*25)+1 and I noticed it returns the same starting number every run, but it changes every hour or so. So last night it was always 25, this morning was 2, and now it's 3. I put a printf("%d", rand()); right after the srand. Sure enough, that gets me:

200829549  
200964005  
201098461  
201232917  
201703513

That seems to be pseudo-seconds in the 3rd/4th digits, which explains the behaviour I'm seeing. This added printf "fixes" it by pumping the sequence and then any subsequent calls to rand are fine. I would really like to understand/solve this the right way.

Clang complains about the srand(time(NULL)), saying it takes an unsigned int and wants me to cast the time_t to silence the warning. time_t is implementation dependant, but I suspect it is really a unsigned long? Perhaps this is an issue with a 32/64 bit conversion going into srand? I found a few tantalizing posts, like this one, but no ultimate conclusion.

This is not about bad random sequence, the sequence is fine (for my needs). There are threads that are about bad starting numbers, but invariably they end up being improper srand use. Does anyone have a suggestion here?

Maury Markowitz
  • 9,082
  • 11
  • 46
  • 98
  • 1
    Seeding with just the time (although extremely common, and recommended to newbies) is problematic. You want some more entropy if you can possibly find it. The next step after calling `srand(time(NULL))` is to try something like `srand(time(NULL) + getpid())`. (Unfortunately that has regularities, too.) I've gotten better results with `srand(time(NULL) | (getpid() << 8))`. Much better to read a few bytes from `/dev/random`, if you've got it. – Steve Summit May 30 '23 at 18:05
  • Is there a correct prototype for `srand()` in scope at the point of the call (from `#include `)? Is there a correct prototype for `time()` in scope at that point (I'm not positive for MacOS, but probably from `#include `)? Do you have your own `time()` function somewhere in your application that might be called instead? – John Bollinger May 30 '23 at 18:14
  • @SteveSummit - tried that, but now it is 22 every time. But perhaps I'm getting the same PID every time because I'm in ldb... – Maury Markowitz May 30 '23 at 18:14
  • @JohnBollinger - I'm including time.h and it's coming from the standard headers in Xcode. No custom time() - wouldn't even know how! – Maury Markowitz May 30 '23 at 18:15
  • Don't forget there's usually some kind of "random device" like `/dev/urandom` which can produce *good* seeds. – tadman May 30 '23 at 18:17
  • Do you see the expected result with a user-specified seed? (Different initial value for every distinct seed, exactly the same sequence the same seed.) – John Bollinger May 30 '23 at 18:18
  • @JohnBollinger - yes, I use this for debugging, I do indeed get the same numbers every time in that case. – Maury Markowitz May 30 '23 at 18:20
  • @tadman - do you know if that is easy to access on Windows/Cigwin as well? That is a target. – Maury Markowitz May 30 '23 at 18:21
  • I don't think I quite followed you before, but I guess when you say "the same starting number" you mean the same *initial digits* and number of digits. Is that right? – John Bollinger May 30 '23 at 18:31
  • @MauryMarkowitz There's probably something functionally equivalent under Windows, but it's not going to be a file `/dev/random` or `/dev/urandom`. (Windows doesn't have the same devotion to "everything's a file" as Unix does. :-) ) – Steve Summit May 30 '23 at 18:32
  • Not sure what the equivalent is on Windows. Random devices are highly OS dependent. Check the Windows API. – tadman May 30 '23 at 18:32
  • 2
    Edit the question to provide a [mre]. – Eric Postpischil May 30 '23 at 18:49
  • @JohnBollinger - orry, that passage is confusing. The BASIC code is taking whatever rand gives it and clamping it to 1...26. So as a result of the upper 4 digits of the rand being similar every run, the result of the BASIC function is always the same *in toto*. It's only when the 3rd digit changes that you'll see the number in the app change. – Maury Markowitz May 31 '23 at 11:53

3 Answers3

2

time_t can be a floating point type which could explain why the seed only changes every now and then. If time_t holds days since some starting time point where the fraction describes the time of day, you'd only see a new seed once every day.

You could try using timespec_get (requires at least C11). The nanosecond part of the struct timespec is specified to be a long so a cast to unsigned will work fine:

#include <stdlib.h>
#include <time.h>

void seeder() {
    struct timespec ts;
    timespec_get(&ts, TIME_UTC);
    srand((unsigned ) ts.tv_nsec);
}

int main(void) {
    seeder();

    // ...
}
Ted Lyngmo
  • 93,841
  • 5
  • 60
  • 108
  • 1
    `srand((unsigned ) ts.tv_nsec);` is interesting. In trying to think of a way to a few more bits than the at best 30 bits of `.tv_nsec` to contribute to the `unsigned seed`: perhaps `(((unsigned) ts.tv_nsec) << 3) ^ (unsigned) .tv_sec`? – chux - Reinstate Monica May 30 '23 at 19:26
  • 1
    @chux-ReinstateMonica Thanks, yes, I went to lie down on the sofa and just had a similar idea to not only get 1000000000 different seeds. I think making the nanosecond part the most influential would be best and not risk shifting away bits from that though if `unsigned` is 16 bits :-) – Ted Lyngmo May 30 '23 at 19:31
  • 1
    True about concerns of `unsigned` wider/narrower than 32-bit. An additional issue is the `.tv_nsec` may not return all possible nanosecond values. : e.g. only 0, 1000, 2000, .... or the like. It is a fun problem with no certain end. – chux - Reinstate Monica May 30 '23 at 19:38
  • 1
    @chux-ReinstateMonica Very true! Finding a true entropy source without using non-standard things like `RDRAND` or similar isn't easy. – Ted Lyngmo May 30 '23 at 19:48
  • This seems like the "most natural" solution, but can anyone comment on whether this will be available widely across platforms? One of my targets is Windows/Cigwin. – Maury Markowitz May 31 '23 at 11:41
  • @MauryMarkowitz It'll create a seed with any C11 compliant implementation since it only uses standard C functions. You can also use `srand((unsigned) clock())` - or use a combination of the two. [example](https://godbolt.org/z/x71x8Ga3K) – Ted Lyngmo May 31 '23 at 13:53
1

The effect you describe suggests that the highest-order bits of the output of your system's rand() are insensitive to the lowest-order bits of its internal state (which you initialize via srand()). From your "every hour or so", I speculate that you may need a difference at bit 12 or higher, corresponding to 4096 s or about 68 minutes, to overcome that effect.

This added printf "fixes" it by pumping the sequence and then any subsequent calls to rand are fine.

That would be consistent. Although the first numbers pulled from the generator on runs performed close together may have similar values, the effects on the PRNG's internal state will be different enough that subsequent results then diverge.

A simple solution, then, might be simply to discard a couple of initial results from rand(), immediately after the srand() call:

if (random_seed > -1) {
  srand(random_seed);
} else {
  srand(time(NULL));
}
rand();
rand();

Alternatively, any approach that causes your seeds to differ more widely is likely to help even with the initial numbers. @SteveSummit's idea of combining the PID with the output of time(NULL) could be that, especially if you shift or rotate the PID as he suggested.* For example, something like

    unsigned int my_pid = getpid();
    unsigned int random_seed = ((unsigned int) time(NULL)) ^ (my_pid << 8);

    // ...

You could go further to get initial seeds that vary more randomly, but my reading of the question suggests that either of the above would be adequate for your needs. And they are not mutually exclusive, of course.


*But note that getpid() and PIDs in general are not native to C. They are POSIX features, so they may not themselves be suitable for your use if you want to rely only on the language spec.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
0

The fundamental problem here is that the standard library function rand(), basically, sucks.

I tested your situation with this code fragment:

printf("   seed         r      number  r2\n");
for(int i = 0; i < 10; i++) {
    int seed = my_time(NULL);
    srand(seed);
    int r = rand();
    double result_number = ((double)r / (double)RAND_MAX); 
    int rr = (int)((result_number*25)+1);
    printf("%d %d %f %d\n", seed, r, result_number, rr);
}

I used a function my_time() so that I could precisely control the way that the standard time function yields values that exactly increment once per second:

time_t my_time(time_t *tp)
{
    static time_t t = 1685535843;
    if(tp != NULL) *tp = t;
    return t++;
}

And since I happen to be on a Mac, it turned out that I could reproduce your surprising result:

   seed         r      number  r2
1685535843 1344125724 0.625907 16
1685535844 1344142531 0.625915 16
1685535845 1344159338 0.625923 16
1685535846 1344176145 0.625931 16
1685535847 1344192952 0.625939 16
1685535848 1344209759 0.625946 16
1685535849 1344226566 0.625954 16
1685535850 1344243373 0.625962 16
1685535851 1344260180 0.625970 16
1685535852 1344276987 0.625978 16

Not very random!

It's pretty well known that a simplistic linear congruential random number generator, which is what many versions of rand() tend to use, has poor randomness properties. The standard problem with such generators is that the low-order bits tend to be cyclic, not random, such that if you take, say, rand() % 2, you'll get an exactly alternating (totally non-random) sequence of 0, 1, 0, 1, 0, 1, ... . I hadn't encountered the problem we're having here, namely that the high order bits seem to be strongly correlated — on the first call, at least — with an srand-provided seed.

There are a number of ways around this. In a comment I suggested adding in some more entropy, perhaps via a call to getpid:

seed = my_time(NULL) + my_getpid()

Again I emulated getpid for total predictability:

int my_getpid()
{
    static int pid = 12345;
    return pid++;
}

But this doesn't help at all, because there's still far too much order in the seeds:

   seed         r      number  r2
1685548188 1551608139 0.722524 19
1685548190 1551641753 0.722539 19
1685548192 1551675367 0.722555 19
1685548194 1551708981 0.722571 19
1685548196 1551742595 0.722586 19
1685548198 1551776209 0.722602 19
1685548200 1551809823 0.722618 19
1685548202 1551843437 0.722633 19
1685548204 1551877051 0.722649 19
1685548206 1551910665 0.722665 19

I'm out of time (I'll update this answer later), but other things you can try are:

  1. use more and better entropy in your seed
  2. burn a few random numbers after calling srand (hey, it works in Vegas!)
  3. use random() instead of rand()
Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • I used the "burn" method - it seems to do the trick and is guaranteed to work on all my targets. Not pretty, but hey, working is a feature. – Maury Markowitz May 31 '23 at 15:42