Odd Repetitions of Patterns When Using Rand()

Question

Sample random password/string generator which generates 32 character strings. So, generates random numbers and keep those which are between 33 and 127 as these are the ASCII values which constitute valid text.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main()
{
    srand(time(0));
    clock_t start = clock();

    long long iterations = 0;

    printf("Generating String...\n\n\t\" ");

    for (int i = 0; i < 32; i++)
    {
        long long holder = 0;
        while(holder < 33 || holder > 126)
        {
            holder = rand();
            iterations++;
        }
        putchar(holder);
    }

    clock_t end = clock();

    printf(" \"\n\n%.2lf s , %lld iterations & %lld avg\n",(double)(end - start)/CLOCKS_PER_SEC,iterations,iterations/32);

    return 0;
}

Output repeats the string DEX&H1_(okd/YVf8;49=el%<j:@"T,NU in one form or another.

Some Outputs :

Generating String...

    " DEX&H1_(okd/YVf8;49=el%<j:@"T,NU "

9.11 s , 893836506 iterations & 27932390 avg

Generating String...

    " xq?!#O]tDEX&H1_(okd/YVf8;49=el%< "

7.59 s , 768749018 iterations & 24023406 avg

Generating String...

    " MJxq?!#O]tDEX&H1_(okd/YVf8;49=el "

7.63 s , 748742990 iterations & 23398218 avg

Compiled with cc file.c -o file on Clang/macOS.

Consider : [Why is the new random library better than std::rand()?](https://stackoverflow.com/q/53040940/327083) — J..., Jan 28 '21 at 04:33
@J... Isn't that C++ ? Also I have used `srand()` properly , so no it doesnt apply here — An Ant, Jan 28 '21 at 04:35
@J... yes, I was commenting on your third comment " Why is the new random library better than std::rand()? " that is C++ — An Ant, Jan 28 '21 at 04:39
Ah, I said "consider" because the top answer to that question offers good discussion on the algorithms used in random number generation that might help you better understand why you are seeing the results you are with `rand()`. — J..., Jan 28 '21 at 04:40
Reading your question more closely, I think the last link is actually probably the most relevant. I did miss your `srand` and assumed it was simply `rand` starting with the same seed. This is actually an interesting question. The "low randomness of low-order bits" is the important weakness in this case, I think. — J..., Jan 28 '21 at 04:51
@J... Yes, while dbush has said something about it , I'm not clear as to *why* this repeats . It is , afaik, just an inefficient alternative to using `%` for getting numbers in a range. Why it repeats while that doesn't ? — An Ant, Jan 28 '21 at 04:54
Because your algorithm is actively seeking pattern matches in the unmodified low order bits (bottom byte of the int) of the returned random number. Modulo examines the entire integer space and can therefore access the higher quality randomness that the algorithm produces in those higher bits. — J..., Jan 28 '21 at 04:56
" *low randomness of low-order bits* " Ah I see, @J... . I love how every time I program something in C it doesn't work straightaway but I do get to learn something very fascinating :) — An Ant, Jan 28 '21 at 05:00
@AnAnt The question provides "*some outputs*" as examples. Out of how many tries? Getting overlaps between 32-long sequences picked out of a 94-long orbit should not be all that surprising. — dxiv, Jan 28 '21 at 06:17
@dxiv Repetitions occured in almost every result out of 7 tries. These are the first, second and fourth respectively. — An Ant, Jan 28 '21 at 06:30
@AnAnt That's not so unexpected. It's too long for a comment, so I posted it as an answer. — dxiv, Jan 28 '21 at 07:07

score 4 · Accepted Answer · answered Jan 28 '21 at 04:24

4

The way you're trying to get random numbers in a range is extremely inefficient. It's also most likely the source of the repetition you're seeing.

You should instead reduce the number returned to be within the desired range.

for (int i = 0; i < 32; i++)
{
    int holder = (rand() % (126 - 33 + 1)) + 33;
    putchar(holder);
}

answered Jan 28 '21 at 04:24

dbush

205,898
23
218
273

Ah, yes this is quite genius in the efficiency. You're right I was being quite wasteful with processing power. But any idea why exactly the repetition was occurring ? – An Ant Jan 28 '21 at 04:26
@AnAnt All pseudo-random number generators repeat at some point. Since you were effectively getting 32 numbers out of a 94 character sequence you were bound to see repetition. – dbush Jan 28 '21 at 04:31
@AnAnt That's because it's getting the first 32 numbers out of around 2^31, with each bit in subsequent numbers fairly random, so using `%` to get a range will be less predictable. – dbush Jan 28 '21 at 04:37
Dear "dbush" I have used the above code in a hobby project of mine ( github.com/a-p-jo/MCLIP-M ) and have credit the above code to this answer. I am free to do so without any other declarations, correct ? – An Ant Feb 04 '21 at 06:15
@AnAnt That's fine. It's a pretty common technique. – dbush Feb 04 '21 at 12:31

dxiv · Answer 2 · 2021-01-28T07:24:15.307

The question of how to do it right has been addressed in another answer already. This is about the "*odd repetitions" part, which may not be so "odd" after all.

The following assumes a typical rand() implementation where:

all possible values are taken exactly once before rand() returns a previous value;
the next rand() value depends only on the previous value.

Under these assumptions, the 94 values between 34 = '\"' and 125 = '}' will be returned in a cycle, which will then repeat unchanged. Then the posted code will always return 32 consecutive characters from that cycle (including wraparound)

Suppose for example that the first run returns the 32-char string DEX&H1_(okd/YVf8;49=el%<j:@"T,NU. That means the 94-char cycle of the rand() generator includes that string followed by some permutation of the remaining 62 characters.

Then the next run will produce a 32-char string that overlaps the first one for at least, say, 16 characters iff the first eligible character has an index between [0, 15] or [78, 93]. The probability of that happening is 16 / 94 ≈ 17%. Conversely, the probability of not having such overlap is ≈ 83%, and the probability of no overlaps in the next 7 runs is 0.83^7 ≈ 0.27. So the chance of getting a "repetition" for the first string in the next 7 runs is ≈ 73% i.e. not too surprising.

[ EDIT ] Also, it follows by a straight pigeonhole argument that any 6 runs will produce at least two strings that have a substring in common of length 16 or more.

Mathematically solved ! Thanks ! Could you also see the probability when following the right way to do it as given by dbush ? — An Ant, Jan 28 '21 at 07:08
@AnAnt That's virtually 0 for all practical purposes. The chance of getting an exact duplicate in the next run is `≈ 10^(-63)`, and that of a 16-char common substring `≈ 10^(-29)`. — dxiv, Jan 28 '21 at 07:45

Odd Repetitions of Patterns When Using Rand()

2 Answers2