Randomly select index from a STL vector from truth value

Question

I have a vector that looks like:

vector<int> A = {0, 1, 1, 0, 0, 1, 0, 1};

I'd like to select a random index from the non-zero values of A. Using this example A, I want to randomly select an element from the array {1,2,5,7}.

Currently I do this by creating another array

vector<int> b;
for(int i=0;i<A.size();i++)
    if(A[i]) 
        b.push_back(i);

Once b is created, I find the index by using this answer:

get random element from container

Is there a more STL-like (or C++11) way of doing this, perhaps one that does not create an intermediate array? In this example A is small, but in my production code this selection process is in an inner-loop and A is non-static and thousands of elements long.

Is the proportion of `1`s to `0`s roughly half? Or reasonably high, anyway. If so, I'd pick a random number less than the total size, and if the value at that index is `0`, re-pick. — BoBTFish, Dec 02 '13 at 16:07
I'd say the best approach depends on how often you want to do that. Is the size of `A` roughly as in your example or may it be huge? — stefan, Dec 02 '13 at 16:08
Can't you use a while loop and keep trying to select a random element until you get a non zero value? — andre, Dec 02 '13 at 16:08
@BoBTFish - I don't think that would be random enough. It would be biased towards the end of the array. — Trenin, Dec 02 '13 at 16:08
@Trenin Why? We don't know anything about the distribution of values within the array, and in `C++` since 2011 you have various distributions you can choose for random number generation anyway. — BoBTFish, Dec 02 '13 at 16:09
Does it have to be the index, or would random iterator where the value is non-zero suffice? — Zac Howland, Dec 02 '13 at 16:09
@BoBTFish - but if it isn't roughly half, then you could take a long time to randomly pick one of the few true valued indecies... Not saying you are wrong, but just pointing out why 'roughly half' is a reasonable requirement. — Trenin, Dec 02 '13 at 16:12
@ZacHowland I need the _index_ not simply an iterator. The index is fed back into a formula that I use for other purposes. — Hooked, Dec 02 '13 at 16:15
@stefan I may know the distribution for the `A`'s for a single run of my programs _a prori_, but this distribution will change with different program inputs. Sometimes A might be sparse and sometimes it might be completely full. Sometimes the distribution will be auto-correlated, sometimes it won't be. — Hooked, Dec 02 '13 at 16:17
Since it is a `vector`, it is very cheap to get the index from the iterator anyway. — BoBTFish, Dec 02 '13 at 16:17
@Hooked In that case, there won't be a "more STL-like" solution. You can improve on the efficiency of your algorithm, though. — Zac Howland, Dec 02 '13 at 16:18
To be honest, I wouldn't use `A` at all as you have it in this case. I would use a `std::unordered_set<>` (or ordered, your choice) to manage the list of "bits" that are "lit", then choose a value from the set via `std::next(s.begin(), rndval)`, where `rndval` is a random pick in `[0..setsize)` — WhozCraig, Dec 02 '13 at 16:19

Matt Cruikshank · Accepted Answer · 2013-12-02T16:21:37.310

5

A great way to do this is Reservoir Sampling.

In short, you walk your array until you find the first non-zero value, and record that index as the first possible answer you might return.

Then, you continue to walk the array. Every time you find a non-zero value, you randomly might change which new index is your possible answer, with decreasing probability.

This algorithm also works great if you need M random index values from your array.

What's great about this, is that you walk each element only one time, and you don't need a separate memory structure to record the non-zero elements. It's O(N) in speed, and O(M) in memory, in your case it's O(1) in memory, since you only want 1 random value.

On the flip side, random number generators are traditionally quite slow. So, you might want to performance test this against any other ideas people come up with here, to see if the trade-off of speed-vs-memory is worth it for you.

edited Dec 02 '13 at 16:21

answered Dec 02 '13 at 16:16

Matt Cruikshank

2,932
21
24

This, I believe at the time of writing, is the superior answer as it permits structure in the reservoir array so +1 and I hope OP accepts. – Bathsheba Dec 02 '13 at 16:19
Good point about RNG. You are calling the RNG once for every element in the array, which could slow it down. – Trenin Dec 02 '13 at 16:28
1

If you're willing to trade memory for speed, and if you can estimate the number of non-zero elements in the array, you can make your reservoir that size, or slightly larger. That way you vastly decrease the odds of needing to repeatedly call the random number generator. Then, if you only need one index, you can call the random number generator ONE MORE TIME against the reservoir itself. – Matt Cruikshank Dec 02 '13 at 16:32

score 2 · Answer 2 · answered Dec 02 '13 at 16:09

2

With a single pass through the array, you can determine how many false (or true) values there are. If you are doing this kind of thing often, you can even write a class to keep track of this for you.

Regardless, you can then pick a random number i between 0 and num_false (or num_true). Then with another pass through the array, you can return the ith false (or true) index.

answered Dec 02 '13 at 16:09

Trenin

2,041
1
14
20

Well in time complexity it is just the same as the initial approach. Uses less space although. – mcserep Dec 02 '13 at 16:13
@Bathsheba - can you explain? – Trenin Dec 02 '13 at 16:13
@CMate - yes. Doesn't require a temporary array to be created and destroyed. – Trenin Dec 02 '13 at 16:14
Briefly and for example, if the probability of drawing a 1, say, is in any way dependent on the previous value or previous values then a naive random number drawing based on the total proportion of 1s and 0s will not be an acceptable optimisation. – Bathsheba Dec 02 '13 at 16:16
@Bathsheba - I don't see how that makes this invalid. The requirement is select a random index from the set of non-zero values. This will first determine how many non-zero values there are, and then pick a random number between 0 and this value. It will then walk the array searching for this index. – Trenin Dec 02 '13 at 16:24

andre · Answer 3 · 2013-12-02T16:29:24.220

1

We can loop through each non-zero value and assign it a random number. The index with the largest random number is the one we select.

int value = 0;
int index = 0;
while(int i = 0; i < A.size(); i++) {
    if(!A[i]) continue;
    auto j = rand();
    if(j > value) {
        index  = i;
        value = j;
    }
}

edited Dec 02 '13 at 16:29

answered Dec 02 '13 at 16:21

andre

7,018
4
43
75

The probability for selecting numbers with this solution is not with uniform distribution. – Ivaylo Strandjev Dec 02 '13 at 16:24
@IvayloStrandjev Yes, I guess we need to generate a list of unique random numbers first. Which is the same as having a vector of temporaries. – andre Dec 02 '13 at 16:26
I believe you could replace `rand()%A.size()` with simply `rand()`. @IvayloStrandjev why? – Guilherme Bernal Dec 02 '13 at 16:27

score 0 · Answer 4 · answered Dec 02 '13 at 18:46

0

vector<int> A = {0,1,1,0,0,1,0,1};
random_shuffle(A.begin(),A.end());
auto it = find_if(A.begin(),A.end(),[](const int elem){return elem;});

answered Dec 02 '13 at 18:46

lucas92

464
2
11

Randomly select index from a STL vector from truth value

4 Answers4