Generating a sequence with custom distribution

Question

I have vector of n elements vector<elem> v; with different probabilities. The probabilities are in the range [0, 1] and their sum is 1.

struct elem
{
   int value;
   double probability;
}

How can I generate a new vector<elem>, where each elem is chosen with a probability based on that distribution? It's better if I can choose the length of the new vector.

https://en.cppreference.com/w/cpp/container/vector/push_back — pptaszni, Nov 11 '20 at 14:02
To clarify, you want each `elem` you add to the new vector to be chosen according to the probability from the original vector? — cigien, Nov 11 '20 at 14:03
@cigien yes, there can also be multiple of the same element in the new vector — Jordan Mironski, Nov 11 '20 at 14:09
Consider using the [alias method](https://en.wikipedia.org/wiki/Alias_method). The Wikipedia page even has a link to a C++ implementation. The alias table takes O(k) time to initialize for a distribution w/ k outcomes, but generating from the table is O(1). — pjs, Nov 11 '20 at 17:19
@SamuelLiew I think the question should be reopened. The requirements are clear, and the question is sufficiently narrow in scope. — cigien, Nov 11 '20 at 19:39
@cigien What makes this notably different from all the other questions on SO about generating discrete distributions with unequal probabilities? — pjs, Nov 11 '20 at 22:04
@pjs I couldn't say for sure, but the couple of C++ questions on the topic I could find didn't really seem to be the same. [This](https://stackoverflow.com/questions/2649717) is the closest I could find, but it's not exactly the same thing. The OP there wants to pick *only* a single item, and hence the solution there is operating under efficiency constraints that don't apply here. If you do find a reasonable dupe, then please share it here, and I'll add an answer there instead. — cigien, Nov 11 '20 at 22:14
@cigien How 'bout this one? https://stackoverflow.com/q/1761626/2166798 — pjs, Nov 11 '20 at 22:31
@pjs Yes, thank you :) That's a perfect target. The mention of `boost` in the question is distracting, but all the answers have happily ignored the `boost` part, so I'll just answer there :) — cigien, Nov 11 '20 at 22:37
@SamuelLiew Don't bother reopening. While the reasons stated in my previous comment to you are still valid, this question is a dupe of [Weighted random numbers](https://stackoverflow.com/questions/1761626). — cigien, Nov 11 '20 at 22:39

score 1 · Answer 1 · answered Nov 11 '20 at 14:20

If you want to pick an elem from the vector v with a probability proportional to the value of a member, you need to implement a selection scheme known as fitness-proportionate or roulette-wheel selection.

To implement this, you can first create a "roulette-wheel" based on the values of the data member:

std::vector<double> wheel, probs;

// extract the probabilities
std::transform(std::begin(v), std::end(v),
               std::back_inserter(probs),
               [](auto const & e) { return e.probability ; });

// then create the roulette wheel
std::partial_sum(std::begin(probs), std::end(probs),
                 std::back_inserter(wheel));

Now to make a single choice, you can spin the wheel, and see which index of the wheel it lands on. Given the construction of wheel, the probability of landing on any index is proportional to the probability value of the elem at that same index in v.

// create the random spinner, and uniformly distributed tip
std::mt19937 spinner;
std::uniform_real_distribution<double> tip(0., 1.); // since the sum is 1.
                                                    // In general, the second argument can be wheel.back()

// spin the wheel and selects an elem
auto spin = [&] {
   auto choice = std::lower_bound(std::begin(wheel), std::end(wheel), 
                                  tip(spinner));
   return v[std::distance(std::begin(wheel), choice)];
};

Now you can generate a new vector of whatever size you want.

// spin the wheel N times to generate next population
std::vector<elem> new_v;
std::generate_n(std::back_inserter(new_v), N, spin);

Note that if you want to generate a new vector without repeating elements, you will have to put in a little more effort to ensure that the choices are still randomly distributed. This choice will also be affected by the size of the new vector you want to generate.

It will take a while to understand why it works. I will read about it in the link you provided. — Jordan Mironski, Nov 11 '20 at 14:45
@JordanMironski Try it out with some hard coded values, and spin the wheel a few times. That should help make more sense of why it works. — cigien, Nov 11 '20 at 14:46
since you are using partial sums how should the elements in the original vector be arranged depending on the probability? — Jordan Mironski, Nov 11 '20 at 15:04
@JordanMironski It doesn't matter, the partial sums can be in any order. It's only the magnitude of the individual partial sums that matter. Again, try it out with original vector in a couple of different orders, and you'll see it doesn't affect the result. — cigien, Nov 11 '20 at 15:06
BTW, please don't accept the answer until you've understood it fully. That makes it less likely that someone else will post an answer that addresses your question in a way that is easier for you to understand. — cigien, Nov 11 '20 at 15:08

Generating a sequence with custom distribution

1 Answers1