With C++17 std::sample
does exactly what you need, but for c++98 it is slightly more complicated.
The shortest code that is compatible with c++98 is:
unsigned pick_below(unsigned n)
{
// poor distribution:
return std::rand() % n;
}
std::vector<std::pair<std::string, int> >
sample(const std::map<std::string, int> & data_in,
unsigned p)
{
std::vector<std::pair<std::string, int> > shuffled(data_in.begin(), data_in.end());
for (unsigned i=shuffled.size() ; i > 1 ; --i)
std::swap(shuffled[i-1], shuffled[pick_below(i)]);
shuffled.erase(shuffled.begin() +p, shuffled.end());
}
This code is problematic on two levels:
std::random
quality is not guaranteed.
- using modulo in pick_below distorts the distribution.
To overcome issue number 2, either use boost::random::uniform_int_distribution
or rewrite the pick_below
function according to this:
unsigned pick_below(unsigned n)
{
unsigned x;
do {
x = rand();
} while (x >= (RAND_MAX - RAND_MAX % n));
return x % n;
}
Fixing issue 1 can be overcome by using a third-party random generator such as boost::random::mt19937
.
Unfortunately, the complexity of this solution is O(n) on average (since pick_below
is not guaranteed to terminate, but on any value p < RAND_MAX / 2
the probability to iterate it more than K times diminishes exponentially to less than 0.5K. The complexity can't be better than O(n) since there is no way to pick a kth element of a map, short of iterating all of it.