I'd solve this problem using a std::unordered_set
to check for already generated numbers. This has expected constant time for both checking and inserting, since it's based on a hash table; summing up to a linear time complexity in the number of numbers to be generated N
.
Generic solution which works with any distribution:
template <typename T, typename Dist, typename Gen>
std::unordered_set<T> unique_generate(Dist &&dist, Gen &&generator, size_t N)
{
std::unordered_set<T> generated;
while (generated.size() < N)
generated.insert(dist(generator));
return generated;
}
Usage with normal_distribution
:
std::random_device rd;
std::mt19937 gen(rd());
std::normal_distribution<double> d(meanValue, stdDevValue);
int N = 1000000;
auto myNumbers = unique_generate<double>(d, gen, N);
To also enforce the numbers to be in the interval [0, 1]
, you can wrap the distribution object using a generic wrapper class ("separation of concerns": don't mix the unique generation with the distribution clamping).
A possible (possibly slow*) implementation throws away generated numbers which are out of bounds:
template<typename Dist, typename T>
class ClampedDistribution {
Dist dist;
T min, max;
public:
ClampedDistribution(Dist dist, T min, T max) :
dist(dist), min(min), max(max)
{}
template <typename Gen>
auto operator()(const Gen & generator) -> decltype(dist(generator)) {
auto value = dist(generator);
while (value > max || value < min)
value = dist(generator);
return value;
}
};
// type-deducing function:
template<typename Dist, typename T>
ClampedDistribution<Dist,T> clamped(Dist dist, T min, T max) {
return ClampedDistribution<Dist,T>(dist, min, max);
}
Usage:
// (from above)
std::normal_distribution<double> d(meanValue, stdDevValue);
// clamp it:
auto clamped_dist = clamped(d, 0.0, 1.0);
// and pass this to unique_generate:
auto myNumbers = unique_generate(clamped_dist, gen, N);
*) It's slow if you choose a high standard deviation for your normal distribution. It's reasonably fast for small deviations though, as the numbers chosen by the normal distribution are more likely to be in the range already.