1

For my program, I needed so far to draw one random value in [0..k[ from time to time, and using C++11 <random> features works really well. My current code is something like

class Random
{
public:
  Random() : rng( rd() ) { }

  inline int getRandNum( int limit ) { return ( numbers(rng) % limit ); } 

private:
  std::random_device rd;
  std::mt19937 rng;
  std::uniform_int_distribution<int> numbers;
};

Now, I need to draw in a row n different values in [0..k[. I was looking for something in <random> allowing that, but either I am not able to find it, or such a thing does not exist yet. Is there a clever, more elegant way to proceed than calling my getRandNum function and repeat until I get n different values?

EDIT: to give an idea, in my program k is some thousands and n some tens.

Florian Richoux
  • 1,543
  • 1
  • 14
  • 28
  • How big is the range 0..k, and how big is 'n'? Answers to these questions will determine the best solution. – DavidO Jul 02 '14 at 07:09
  • With an array of `[0..k[`, you may use `std::shuffle` – Jarod42 Jul 02 '14 at 07:10
  • 1
    If 'n' is small relative to the size of the range of 0..k, then draw-and-discard throws the smaller amount of memory at the problem at the computational cost of discarding and re-drawing due to a few collisions. If n approaches the size of the range, collisions will become prohibitively common, but memory savings for draw-and-discard will be inconsequential since keeping track of draws will consume close to the same amount of memory as the generate-and-shuffle method. In such a case, generate-and-shuffle is optimal. – DavidO Jul 02 '14 at 07:16
  • 2
    Note that your current code is flawed: you probably have a step somewhere in the distribution. You should create the distribution in the `getRandNum` function passing `(0, k-1)` to the constructor and avoid custom range resizes. – DarioP Jul 02 '14 at 07:54
  • @DarioP I didn't notice any steps so far, even with small k (like 10), but thanks anyway, I will be careful about that. – Florian Richoux Jul 02 '14 at 08:24
  • @DavidO I edit my question to give an idea of k (some thousands) and n (some tens). I don't care about space, but I do about speed! This program must be as fast as possible. – Florian Richoux Jul 02 '14 at 08:26
  • @Jarod42 That's an idea indeed. A bit a waste to shuffle [0..k[ if I just want n values, with n << k, but I agree it is somewhat more elegant. Not sure it is more efficient from a time complexity point of view, though. – Florian Richoux Jul 02 '14 at 08:29
  • @FlorianRichoux I believe this post answers your question: [link](http://stackoverflow.com/questions/3722430/most-efficient-way-of-randomly-choosing-a-set-of-distinct-integers). A similar one is here: [link](http://stackoverflow.com/questions/2394246/algorithm-to-select-a-single-random-combination-of-values). The algorithm is by Robert Floyd. – lightalchemist Jul 02 '14 at 09:02
  • Don't use % to get your range. It does not give an even distribution. Use std::uniform_int_distribution – Neil Kirk Jul 02 '14 at 09:10
  • @NeilKirk: he uses `std::uniform_int_distribution` already, just uses it incorrectly – Matthieu M. Jul 02 '14 at 09:23
  • @MatthieuM. No, I don't incorrectly use `std::uniform_int_distribution`, it is just I don't use it a way you like or you are used to. For my program, I don't need to perfectly draw a random number (that is impossible to implement anyway), I just need a number which is random enough. Yes, if `limit` does not divide `MAXINT`, I don't have a uniform distribution anymore, but I don't care. On the other hand, I have a function to draw a random number on different ranges, and that fits my needs. – Florian Richoux Jul 02 '14 at 09:53
  • @NeilKirk Of course it remains a distribution, it is just not uniform anymore if `limit` does not divide `MAXINT`, but I don't care for what I do. – Florian Richoux Jul 02 '14 at 09:55
  • 1
    @FlorianRichoux: in this case there is no use in having an instance of `std::uniform_int_distribution`; the only interest of the instance (in general) is to ensure proper distribution, if you do it manually on the side, you can just as easily create a new instance each time. And if you wish to use the distribution properly, then you need to precise its range at creation (you could precise a range of `0, k*l*m` where `k`, `l` and `m` are the limits you intend to use afterward). I would note that `%` on integrals is perhaps one of the slowest CPU operations available, by the way. – Matthieu M. Jul 02 '14 at 09:59
  • If you care a lot about the speed and do not need high quality randomness then why are you using a Mersenne Twister engine? – DarioP Jul 02 '14 at 10:34
  • @FlorianRichoux It seems strange you try to use a uniform distribution but then don't care about it. – Neil Kirk Jul 02 '14 at 10:59
  • @DarioP Because I didn't know that the MT engine was costly. :) In that case, what do you suggest? – Florian Richoux Jul 02 '14 at 12:22
  • @MatthieuM. @NeilKirk Well, my though was: it is simple to use `std::uniform_int_distribution` and even if at the end I don't have something really uniform, I am not far from it. Anyway, thanks for your comments. – Florian Richoux Jul 02 '14 at 12:24
  • @lightalchemist The Floyd algorithm is finally call my getRandNum till I have n different values, which is what I try to avoid (seems inelegant, but for n << k it is maybe the better solution). But thanks for the links! – Florian Richoux Jul 02 '14 at 12:28
  • Just use an instantiation of a linear congruential engine like the `minstd_rand`. If you really want to manually resize the range, then use the `operator()` of the engine to get the numbers avoiding the distribution at all (discouraged but if you really want...). It is also useless to instantiate a `random_device` just to generate the seed, use the time since epoch for that. Example here: http://www.cplusplus.com/reference/random/linear_congruential_engine/linear_congruential_engine/ – DarioP Jul 02 '14 at 12:52
  • @FlorianRichoux You are welcome. Hope this solves your problem. Gd luck. – lightalchemist Jul 02 '14 at 14:15

3 Answers3

2

This solution is not C++ specific but can be easily implemented in any language.

What you want is essentially shuffle numbers 0 to k and pick the first n numbers, where n <= k. This can be done using a reservoir sampling algorithm. See this wikipedia link for the pseudocode.

Note that it is possible to get the n numbers without storing all k numbers and shuffling them. That is, it is possible to just use O(n) space, where n is the number of random numbers you wish to obtain, instead of O(k). The time complexity for this algorithm is O(k), if we assume generating the random number takes O(1) time.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
lightalchemist
  • 10,031
  • 4
  • 47
  • 55
  • It should be mentioned that this way you get a transposition of numbers, where each number appears exactly once. Sometimes this may be an issue. – FreeNickname Jul 02 '14 at 07:38
  • @FreeNickname. You are right but I believe this is what the op wants. He stated that he wish to "draw n different values in [0, k]", unless I misunderstood what you meant. – lightalchemist Jul 02 '14 at 07:40
  • yes, you're right. I was confused by ``. And actually, if this is the case, then calling `` `n` times is not actually a correct solution, since it might give the same number more than once. – FreeNickname Jul 02 '14 at 07:46
  • Yes, I want n different values, so a random permutation of [0..k[ and then taking the n first number is a solution to my problem. My concern is, it is in time O(k) indeed, and I rather like something in O(n). I don't care about space complexity. – Florian Richoux Jul 02 '14 at 08:31
  • I mean, I care about speed, not space, so something in time O(n) would be great. – Florian Richoux Jul 02 '14 at 08:53
  • If you care about speed and not space, the answer is to generate a list of every number between 0 and k, shuffle the numbers, and then start popping them off the list one by one. You will have no repeats. If you exhaust the range, then your range was too small, unless you want to relax the no-repeats requirement and generate a new list in the same range. – DavidO Jul 02 '14 at 15:34
  • @DavidO, it depends on whether numbers should be unique in scope of one generated sequence, or globally. I understand it like this: we need to get n unique numbers (n ~ 10) from [0..k-1] (k ~ 1000). In this case generating a permutation of thousands of numbers every time we need to get several tens is a waste of resources. Not only a waste of memory, but also a waste of CPU time. – FreeNickname Jul 02 '14 at 18:38
1

If k is several thousands and n is tens, then a permutation generation is really not the best choise. But calling getRandNum is not what you want too, because it can return the same value several times. One option is to generate random sequence all at once, checking that the numbers don't repeat. The easiest (and may be even the most efficient) way to achieve this is to use a set.

Like so:

#include <vector>
#include <set>
#include <iostream>
#include <random>

class Random
{
public:
  Random() : rng( rd() ) { }

  inline int getRandNum( int limit ) { return ( numbers(rng) % limit ); }
  std::set<int> getRandSequence(int limit, int n);

private:
  std::random_device rd;
  std::mt19937 rng;
  std::uniform_int_distribution<int> numbers;
};

std::set<int> Random::getRandSequence(int limit, int n)
{
    std::set<int> generatedSequence;
    while (generatedSequence.size() < n) //size() for set is O(1) if I'm not mistaken
        generatedSequence.insert(getRandNum(limit));
    return generatedSequence;
}

int main()
{
    Random r;
    auto sequence = r.getRandSequence(1000, 10);
    std::cout << "Seq;uence: "  << std::endl;
    for (int number : sequence)
        std::cout << number << std::endl;
    std::cout << "End" << std::endl;

    return 0;
}

Ideone demo.

By the way, random_device creation is expensive, but uniform_int_distribution creation, as far as I remember, is not. So this might be even more efficient:

std::set<int> Random::getRandSequence(int limit, int n)
{
    std::uniform_int_distribution<int> uiniformDistribution(0, limit);
    std::set<int> generatedSequence;
    while (generatedSequence.size() < n)
        generatedSequence.insert(uiniformDistribution(rng));
    return generatedSequence;
}

Besides, when you get a uniform distribution and then apply % limit to it, you don't get a uniform distribution anymore.

FreeNickname
  • 7,398
  • 2
  • 30
  • 60
-1
std::random_device rd; // obtain a random number from hardware
std::mt19937 eng(rd()); // seed the generator
std::uniform_int_distribution<> distr(0, 1500); // define the range

for(int a=0; a<limit; a++){
    cout << distr(eng);  //draw random nubmer
Quest
  • 2,764
  • 1
  • 22
  • 44