Does downsampling a random sequence make it less random? Is there a principle/theorem that shows this?

Question

I'm wondering whether downsampling a random (or psuedorandom) sequence makes it less random or preserves its randomness. For example, if you take a series of psuedorandom bytes, as shown in the code below, and throw out all but the alphanumeric characters, is the resulting string of alphanumeric characters still psuedorandom? What about for a random case?

Is there a mathematical or computing principle or theorem that shows this one way or the other?

I looked at this question: Is a subset of a random sequence also random?

But this does not cover specifically a selection process that includes knowledge of the values that are being selected. The answer by MusiGenesis seems to say that this might cause less randomness.

// Open the /dev/urandom file to read random bytes
ifstream rand_file("/dev/urandom");

if (!rand_file) {
    cout << "Cannot open /dev/urandom!" << endl;
    return return_code::err_cannot_open_file;
}

string password("");
vector<char> rand_vec(rand_vec_length, 0);
while (password.length() < pwd_length) {
     fill_rand_vec(rand_vec, rand_file);

    // Iterate through the vector of psuedo-random bytes and add 
    // printable chars to the password
    for (auto rand_char : rand_vec) {
        if (isprint(rand_char) && !isspace(rand_char)) {
            password += rand_char;
        }

        if (password.length() >= pwd_length) {
            break;
        }
    }
}

https://en.wikipedia.org/wiki/Randomness_tests - though I'm not sure it's really on topic for stack overflow. Probably more a math question than a programming question. Why not just randomly generate alphanumeric values to start with? — xaxxon, Apr 22 '19 at 03:16
Depends on what you call "less random". There are no degrees of randomness in the probability theory. You can ask how well a sequence fits a given distribution. A sequence of true random uniformly distributed integers in `[0,1]` would be a very poor fit for a uniform distribution in `[0,9]`. — n. m. could be an AI, Apr 22 '19 at 04:19
@PasserBy you can use entropy as a measure of randomness (not the only one possible measure) if you know the distribution. — n. m. could be an AI, Apr 22 '19 at 08:03

Michael Kenzel · Accepted Answer · 2019-04-22T23:03:32.567

I'm not a mathematician, but it would seem to me that, assuming your initial sequence of bytes was uniformly distributed, after throwing out all the bytes that were not in your desired range, the remaining bytes must still be uniformly distributed. It's just that you have no idea in advance how many random bytes you will have to take to end up with a given desired number of random alphanumeric characters. It may actually take arbitrarily long, which makes this method not particularly efficient. But the method by which you arrived at the output values did not prefer any alphanumeric character value over any other, so the resulting alphanumeric characters, however many they may be, cannot really be anything but uniformly distributed as well.

It would seem to me that what you're describing is basically Rejection Sampling, which is a standard technique capable of generating samples from arbitrary probability distributions. You may want to read up on that for mathematical proofs. I believe your particular example can be viewed as rejection sampling a probability distribution where alphanumeric character values have probability 1/36 (I presume, depends on what exactly you consider alphanumeric) while every other value has probability 0…

Does downsampling a random sequence make it less random? Is there a principle/theorem that shows this?

1 Answers1