Randomly pick from a vector in C++?

Question

I have a vector that allows for duplicates, I want to randomly chose an element with the probability that represents how many times an element was repeated.

For example - for the vector below, 6 should have the highest probability of being chosen. I thought about using rand(), but I am not quiet sure how to incorporate the probability.

vector A = [ 0, 0, 2, 2, 4, 5, 1, 6, 6, 6]

thanks

how bout you just pick an element? That incorporates its frequency within the vector — Neil Locketz, Jun 15 '17 at 20:55
just choose randomly from the vector. You will naturally get the correct distribution, a number that is in the list 4 times will get picked 4 times more often than one in there once.(assuming you use a real random number generator) — pm100, Jun 15 '17 at 20:55
`vector A [ 0, 0, 2, 2, 4, 5, 1, 6, 6, 6] ` That won't even compile. Did you mean `std::vector A ({ 0, 0, 2, 2, 4, 5, 1, 6, 6, 6});`? — πάντα ῥεῖ, Jun 15 '17 at 20:57
@Kattie _"how can i "chose randomly" from the vector?"_ By using a randomly created index. — πάντα ῥεῖ, Jun 15 '17 at 20:59
By using the standard library facilities that generate numbers at random. — StoryTeller - Unslander Monica, Jun 15 '17 at 20:59

score -1 · Accepted Answer · answered Jun 15 '17 at 21:05

-1

I think you are on the right way for getting a custom distribution of values. See the following code which demonstrates the access to the vector. Hope it helps.

#include <cstdlib>
#include <iostream>
#include <ctime>
#include <vector>

int main()
{
    std::vector<int> A { 0, 0, 2, 2, 4, 5, 1, 6, 6, 6 };
    std::srand(std::time(0)); // use current time as seed for random generator
    int random_pos = std::rand() % A.size();  // Modulo to restrict the number of random values to be at most A.size()-1
    int random_val = A[random_pos];
}

answered Jun 15 '17 at 21:05

Stephan Lechner

34,891
4
35
58

thanks, this is what I was looking for .. but looking at other questions, many people mentioned that using the module would not create a random distribution.. so I am wondering if it would be okay in the case of this vector? – Kattie.S Jun 15 '17 at 21:09
A random generator is expensive to construct. Construct it only once with `static`. Further make it `thread_local`. If not, multiple threads will likely get same results. – user1587451 Jun 15 '17 at 21:15
`rand()` is a pseudo-random generator, which is sufficient for many cases, but will have shortcomings when it comes to cryptography, for example. The distribution of `rand()` is not guaranteed to be uniformly distributed, and so a modulo on `rand()` is not guaranteed to be, too. Yet you are controlling the "effective" distribution much more with the content of the array you pre-fill. So I'd not care to much about the modulo - any derivation from uniform distribution achieved by `rand() % 9` (in your case) will have very very little influence compared to the distribution of values in the vector – Stephan Lechner Jun 15 '17 at 21:16
@user1587451: generating random numbers and probably within several threads is a special topic; I think it goes beyond that what the OP formulated in the question. Whether std::rand() is thread save is implementation defined; yet I think one first has to define the multithreading setting. – Stephan Lechner Jun 15 '17 at 21:22
@StephanLechner you are right, the OP did not asked for that but so many cores out there, I presuppose the OP asked for a MT solution. – user1587451 Jun 15 '17 at 21:28
@StephanLechner: although not explicitly stated in the standard, `rand()` is uniform in every implementation I saw; the problem is that *the modulo* isn't going to be uniform, especially with larger modulo values. – Matteo Italia Jun 15 '17 at 22:22
@MatteoItalia In case anyone is curious why this is true, https://stackoverflow.com/questions/10984974/why-do-people-say-there-is-modulo-bias-when-using-a-random-number-generator this explains it really well. – ozeanix Jun 15 '17 at 22:52
@MatteoItalia in my case, it will be okay right? how big of values are you referring to? – Kattie.S Jun 16 '17 at 07:01
The thing starts to become a problem once `residuals = RAND_MAX % modulo` becomes some significant fraction of `RAND_MAX`; in general, unless `residuals == 0` (where the distribution remains uniform) if we call `buckets = RAND_MAX/modulo` (`/` here is integer division) every number has probability `float(buckets) / RAND_MAX` of being extracted, except numbers between 0 and `residuals`, which have `float(buckets + 1) / RAND_MAX`. – Matteo Italia Jun 16 '17 at 07:48
Say `RAND_MAX` is 32767 (as in VC++) and `modulo` is 20000; here you'll have `buckets = 1`, so P([0, residuals)) = 2. / RAND_MAX ≈ 6.1E-5, while P([residuals, modulo)) = 1. / RAND_MAX ≈ 3.1E-5. So, numbers between 0 and 12767 have twice the probability to be extracted. OTOH, on small numbers the difference is negligible; in your case (modulo = 9) you'd have `residuals = 7`, `buckets = 3640`; P([0, 7)) = (3640. + 1.) / 32767 = 0.11112, while P([7, 9)) = 3640. / 32767 = 0.11108. – Matteo Italia Jun 16 '17 at 07:52
To sum it up, the key metric here is `buckets + 1. / buckets`; if this is significantly larger than 1 (say, greater than 1.2), you have a problem. – Matteo Italia Jun 16 '17 at 07:59

score -1 · Answer 2 · answered Jun 15 '17 at 21:05

Maybe something like this (untested!):

#include <vector>
#include <random>
#include <iostream>

int main()
{
  std::vector<size_t> A{0, 0, 2, 2, 4, 5, 1, 6, 6, 6};

  static thread_local std::mt19937 g{std::random_device{}()};

  static thread_local std::uniform_int_distribution<size_t> d{0,A.size()};

  std::cout << A[d(g)] << std::endl;
}

Randomly pick from a vector in C++?

2 Answers2