C++: How to generate random numbers while excluding numbers from a given cache

Question

So in c++ I'm using mt19937 engine and the uniform_int_distribution in my random number generator like so:

#include <random>
#include <time.h>

int get_random(int lwr_lm, int upper_lm){

std::mt19937 mt(time(nullptr));
std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);
return dist(mt);

}

What I need is to alter the above generator such that there is a cache that contains a number of integers I need to be excluded when I use the above generator over and over again. How do I alter the above such that I can achieve this?

Aside from the performance cost that @JesperJuhl mentions as a drawback of reinitializing the generator in every call, another drawback is that multiple calls within the same second will generate the same number. — Ludwig Schulze, Aug 12 '18 at 15:04
Is cache well known before sampling start? Basically, how often you have to update cache per number of samples? Another question - could you live with imperfect check scenario, i.e. with false positives? — Severin Pappadeux, Aug 12 '18 at 15:47

Jesper Juhl · Answer 1 · 2018-08-12T15:12:32.707

3

There are many ways to do it. A simple way would be to maintain your "excluded numbers" in a std::set and after each generation of a random number, check whether it is in the set and if it is then generate a new random number - repeat until you get a number that was not in the set, then return that.

Btw; while distributions are cheap to construct, engines are not. You don't want to re-construct your mt19937 every time the function is called, but instead create it once and then re-use it. You probably also want to use a better seed than the current time in seconds.

edited Aug 12 '18 at 15:12

answered Aug 12 '18 at 14:56

Jesper Juhl

30,449
3
47
70

I get that, but what I don't get is how to do that using the given modules mentioned above, I know how to do it with an ordinary "rand()" but not with mt19937 or uniform_int_distribution . – random_x_y_z Aug 12 '18 at 14:58
@random_x_y_z See my answer for that. – Tanveer Badar Aug 12 '18 at 14:59
1

@random_x_y_z whether the random number comes from `rand()` or `uniform_int_distribution` shouldn't change anything regarding checking against a set of numbers to discard. You have a number, you have a set to check against, you potentially need to generate a new number. How does the method of number generation change anything? I don't get why that would trip you up. – Jesper Juhl Aug 12 '18 at 15:02

score 1 · Answer 2 · answered Aug 12 '18 at 15:09

Are you 1) attempting to sample without replacement in the discrete interval? Or is it 2) a patchy distribution over the interval that says fairly constant?

If 1) you could use std::shuffle as per the answer here How to sample without replacement using c++ uniform_int_distribution

If 2) you could use std::discrete_distribution (element 0 corresponding to lwr_lm) and weight zero the numbers you don't want. Obviously the memory requirements are linear in upper_lm-lwr_lm so might not be practical if this is large

Severin Pappadeux · Answer 3 · 2018-08-13T13:57:36.597

I would propose two similar solutions for the problem. They are based upon probabilistic structures, and provide you with the answer "potentially in cache" or "definitely not in cache". There are false positives but no false negatives.

Perfect hash function. There are many implementations, including one from GNU. Basically, run it on set of cache values, and use generated perfect hash functions to reject sampled values. You don't even need to maintain hash table, just function mapping random value to integer index. As soon as index is in the hash range, reject the number. Being perfect means you need only one call to check and result will tell you that number is in the set. There are potential collisions, so false positives are possible.
Bloom filter. Same idea, build filter with whatever bits per cache item you're willing to spare, and with quick check you either will get "possible in the cache" answer or clear negative. You could trade answer precision for memory and vice versa. False positives are possible

If the "cache of excluded numbers" is large, then a bloom filter is a nice approach. But for small sets it is probably overkill. — Jesper Juhl, Aug 13 '18 at 19:37
@JesperJuhl Yes, it is. Question is how to live with imperfect distribution - false positives, numbers out of cache but having marked in, will somewhat affect it — Severin Pappadeux, Aug 13 '18 at 19:49

score 0 · Answer 4 · answered Aug 12 '18 at 14:58

0

It might not be the prettiest solution, but what's stopping you from maintaining that cache and checking existence before returning? It will slow down for large caches though.

    #include <random>
    #include <time.h>
    #include <set>

    std::set<int> cache;

    int get_random(int lwr_lm, int upper_lm){

    std::mt19937 mt(time(nullptr));
    std::uniform_int_distribution<int> dist(lwr_lm, upper_lm);

    auto r = dist(mt);

    while(cache.find(r) != cache.end())
       r = dist(mt);

    return r;
}

answered Aug 12 '18 at 14:58

Tanveer Badar

5,438
2
27
32

Even I thought about doing this! However given my particular problem I fear it could lead to an exponential increase in processing time the more I use it! – random_x_y_z Aug 12 '18 at 15:02
By this, I think, you mean there's a `cache.insert(get_random())` involved somewhere? – Tanveer Badar Aug 12 '18 at 15:04
@random_x_y_z How much you use it is irrelevant. The extra performance cost is determined by the size of your cache. – super Aug 13 '18 at 03:45

score 0 · Accepted Answer · answered Aug 12 '18 at 15:40

As mentioned by @virgesmith, in his answer, it might be better solution in function of your problem.
The method with a cache and uses it to filter future generation is inefficient for large range wiki.

Here I write a naive example with a different method, but you will be limited by your memory. You pick random number for a buffer and remove it for next iteration.

#include <random>
#include <time.h>
#include <iostream>

int get_random(int lwr_lm, int upper_lm, std::vector<int> &buff, std::mt19937 &mt){
  if (buff.size() > 0) {
    std::uniform_int_distribution<int> dist(0, buff.size()-1);
    int tmp_index = dist(mt);
    int tmp_value = buff[tmp_index];
    buff.erase(buff.begin() + tmp_index);
    return tmp_value;
  } else {
    return 0;
  }
}

int main() {
  // lower and upper limit for random distribution
  int lower = 0;
  int upper = 10;

  // Random generator
  std::mt19937 mt(time(nullptr));

 // Buffer to filter and avoid duplication, Buffer contain all integer between lower and uper limit
  std::vector<int> my_buffer(upper-lower);
  std::iota(my_buffer.begin(), my_buffer.end(), lower);

  for (int i = 0; i < 20; ++i) {
    std::cout << get_random(lower, upper, my_buffer, mt) << std::endl;
  }

  return 0;
}

Edit: a cleaner solution here

C++: How to generate random numbers while excluding numbers from a given cache

5 Answers5