Randomly sample an array with a fixed number of zeros

Question

I have optimized code to randomly sample an array containing -1s, 0s and 1s with probabilities 1/4,1/2,1/4. It looks like

#define n (12)
unsigned int x,y=34353,z=57768,w=1564; //PRNG seeds
/* xorshift PRNG
 * Taken from https://en.wikipedia.org/wiki/Xorshift#Example_implementation
 * Used under CC-By-SA */
u_int32_myRand() {
    unsigned int t;
    t = x ^ (x << 11);
    x = y; y = z; z = w;
    return w = w ^ (w >> 19) ^ t ^ (t >> 8);
}

x=(int)time(NULL); //seed PRNG
unsigned int k
int F[n];
for(k=0; k<n; k++) {
      F[k]=(1-(myRand()&3))%2;  
    }

How can I modify this so that it only returns arrays that have exactly n/3 zeros in them and still have it fast?

@Don'tYouWorryChild myRand() just makes a random number. 1-myRand()&3 gives -1, 0, 1 or 2. So (1-(myRand()&3))%2 gives 0 with prob 1/2 and -1 and 1 with prob 1/4, as required. — Simd, Apr 29 '14 at 11:32
So for your answer, just remove `%2` and cast it to `unsigned`, that will make the operation `&3` to return either `0`, `1` or `2`, therefore `0` has probability of `1/3` now. — 0xF1, Apr 29 '14 at 11:39
@Don'tYouWorryChild Yes but I need to fix the number of zeros at exactly n/3. — Simd, Apr 29 '14 at 11:39
@user2179021 Umm if `myRand()` gives just a random integer, then doesn't `myRand()&3` produce numbers from set `{0,1,2,3}`. Thus `1-` would give `{-2,-1,0,1}`. Also did you check if it actually produces numbers with expected probability? I think that LCGs are bad at random distribution of least significant bits. Lastly, it's damn random, how do you expect it to be fixed? It isn't fixed even now. To have a fixed number of `0`s you have to randomize their position, not value. — luk32, Apr 29 '14 at 11:43
Since you are using PRNG, it will repeat at some point, you can try using this feature of it. — 0xF1, Apr 29 '14 at 11:43
@luk32 Ah yes {-2,-1,0,1,} you are right. Of course mod 2 you get the same result. I didn't check the quality of the random number generator. As you suggest, I think I need to pick n/3 positions at random, place zeros there, then place randomly chosen -1s and 1s in the remaining positions. — Simd, Apr 29 '14 at 11:45
It seems you can select the n/3 positions using http://www.nowherenearithaca.com/2013/05/robert-floyds-tiny-and-beautiful.html . I am not sure how to put this all together efficiently however. — Simd, Apr 29 '14 at 11:48

score 3 · Answer 1 · answered Apr 29 '14 at 12:37

The easiest way to do this would be to fill the first part of the array with n/3 zeros. Then add as many 1's and -1's as you want. Then, do a Fisher-Yates shuffle to randomize the array.

The problem with trying to "randomly distribute n/3 zeros" is that you have difficulty preventing overlap. That is, if you want to put 33 zeros in an array of 99, you can't just pick 33 random indexes because you're likely to get duplicates. So you'll end up with fewer than 33 zeros in the array.

As for performance, this will be nearly as fast as your current example. It just requires an extra pass over the array. The number of random numbers generated is the same.

Danvil · Accepted Answer · 2014-04-29T13:28:34.220

Proceed in two steps:

Distribute exactly n/3 zeros randomly over your array and set the rest to 1.
Assign a random sign to the remaining ones to get the desired -1/+1.

Example code:

int F[n];
// fill with 1
for(k=0; k<n; k++) {
    F[k] = 1;
}
// distribute n/3 zeros
for(k=0; k<n/3; k++) {
    // find a location which does not have a 0 yet
    int i;
    do {
        i = myRand() % n;
    } while(F[i] == 0);
    F[i] = 0;
}
// change remaining (non zero) to -1 with 50% probability
for(k=0; k<n; k++) {
    if(F[k] && myRand()%2) F[k] = -1;
}

This has a runtime of around 2.4 n, but I do not think you can get much faster than that.

The while loop in the second for loop is in average executed around 1.2 times for the case of n/3 zeros.

Remark:

Trial and error used in the second for loop works quite well if the success probability is high enough. The number of trials you need in average for a probability of p is 1/p.

In our case (n/3 zeroes) the worst probability to find a good location (i.e. for the last zero) is 2/3, thus in average 1.5 iterations. To find places for all n/3 zeros you need in average around 0.2*n iterations.

The average runtime can be computed as -log(1-a), where a is the percentage of zeroes you want to distribute (in your case a = 1/3).

Some more examples: If you would want to distribute 2/3*n zeros, it would take 1.1*n iterations. For 0.99*n zeroes it is already 4.6*n iterations.

All in average. In the worst case it takes forever.

If you need a runtime guarantee you are perhaps better off by implementing real sampling without reselection, i.e. filling a container with all possible indices, sampling a random element as index and removing it from the container. But this probably has a runtime of around O(n*log(n)). Thus it works good for small n or a large percentage of zeroes.

Thank you. I think the method you are using for distributing the n/3 zeros is "trial and error" from http://eyalsch.wordpress.com/2010/04/01/random-sample/ . — Simd, Apr 29 '14 at 12:20
Trial and error works good if you success probability is high enough. I added a remark about that to my question. — Danvil, Apr 29 '14 at 12:29
I found the problem. i = myRand() % n; can be a negative number! — Simd, Apr 29 '14 at 13:11
Also if(F[k] && myRand()%2) F[n] = -1; looks like a bug. Do you mean F[k] = -1? — Simd, Apr 29 '14 at 13:23
It's still broken because of i = myRand() % n; i can be a negative number. — Simd, Apr 29 '14 at 13:30
@user2179021: If you have a look at the link you provided for the xorshift, you will see that it is not implemented correctly. Your implementation uses `int`, while it should use `uint32_t` (i.e. `unsigned int`). — Danvil, Apr 29 '14 at 13:34

score 1 · Answer 3 · edited May 23 '17 at 11:50

Here is a simple algorithm that will do:

Put the required number of zeros (n/3) into your output array
For the remaining places, put 1 or -1 with probability 1/2
Shuffle the array

However:

... array containing -1s, 0s and 1s with probabilities 1/4,1/2,1/4

Does it mean that there are exactly n/4 1's in the array? Let's assume it doesn't. Then:

Count the number of 1's (a) and -1's (b) in your array
Probability for 1 is a/(a+b); use it in the above algorithm instead of 1/2

Note: if there are only zeros, or no zeros at all, in your input array - sampling with exactly n/3 zeros is impossible!

Randomly sample an array with a fixed number of zeros

3 Answers3

Linked