4

Suppose I have a vector<Point> p of some objects.

I can pick a uniformly random by simply p[rand() % p.size()].

Now suppose I have another same-sized vector of doubles vector <double> chances.

I want to randomly sample from p with each element having a probability analogous to its value in chances (which may not be summing to 1.0). How can I achieve that in C++?

Michael
  • 325
  • 3
  • 14
  • 1
    You can use `std::discrete_distribution` to obtain an integer distributed according to your vector of `chances`, and then use that integer as an index into your vector of points. – Nate Eldredge Nov 07 '21 at 15:16
  • 1
    `rand() % someNumber` [is not a uniform distribution](https://stackoverflow.com/questions/52869166/why-is-the-use-of-rand-considered-bad). – Yksisarvinen Nov 07 '21 at 15:18

1 Answers1

11

You are looking for std::discrete_distribution. Forget about rand().

#include <random>
#include <vector>

struct Point {};

int main() {
    std::mt19937 gen(std::random_device{}());

    std::vector<double> chances{1.0, 2.0, 3.0};
    // Initialize to same length.
    std::vector<Point> points(chances.size());
    // size_t is suitable for indexing.
    std::discrete_distribution<std::size_t> d{chances.begin(), chances.end()};

    auto sampled_value = points[d(gen)];
}

Conveniently for you, the weights do not have to sum to 1.

Quimby
  • 17,735
  • 4
  • 35
  • 55
  • The algorithm normalises the weights with a pass over the vector of probabilities? – Robinson Aug 31 '22 at 11:30
  • @Robinson Not sure I understand, `d` holds its own copy of probabilities/weights, `chances` are not modified. Whether they are normalized inside `d` is up to the implementation. Sampling is done from a prob. distribution given by the normalized weights. – Quimby Aug 31 '22 at 11:57
  • No I mean I think it adds up all the probabilities to 1, and that's how it can sample the distribution as (1.0/sum, 2.0/sum, etc.). Here sum is 6, so probabilities would be 1/6, 2/6 and 3/6, adding up to 1. I was just wondering why the probabilities didn't have to sum you see. – Robinson Aug 31 '22 at 12:14
  • 1
    @Robinson I see, I believe the lack of normalization requirement of the passed in weights is for pure convenience and to avoid re-normalization that the algorithm would have to do. It might also not be needed for the sampling algorithm as one can sometimes do away with cumulative sum only. But I am not sure which case is more stable w.r.t limited precision. – Quimby Aug 31 '22 at 12:47