My short answer: in no way.
Just because the problem definition is incorrect. As Axn brilliantly noticed:
There is a little bit of contradiction going on in the requirement. It states that K <= N. But as K approaches N, the frequency requirement will be contradicted by the Uniqueness requirement. Worst case, if K=N, all elements will be returned (i.e appear with same frequency), irrespective of their weight.
Anyway, when K is pretty small relative to N, calculated frequencies will be pretty close to theoretical values.
The task may be splitted on two subtasks:
- Generate random numbers with a given distribution (specified by weights)
- Generate unique random numbers
Generate random numbers with a given distribution
- Calculate sum of weights (
sumOfWeights
)
- Generate random number from the range
[1; sumOfWeights]
- Find an array element where the sum of weights from the beginning of the array is greater than or equal to the generated random number
Code
#include <iostream>
#include <cstdlib>
#include <ctime>
// 0 - id, 1 - weight
typedef unsigned Pair[2];
unsigned Random(Pair* i_set, unsigned* i_indexes, unsigned i_size)
{
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][2];
}
const unsigned random = rand() % sumOfWeights + 1;
sumOfWeights = 0;
unsigned i = 0;
for (; i < i_size; ++i)
{
const unsigned index = i_indexes[i];
sumOfWeights += i_set[index][3];
if (sumOfWeights >= random)
{
break;
}
}
return i;
}
Generate unique random numbers
Well known Durstenfeld-Fisher-Yates algorithm may be used for generation unique random numbers. See this great explanation.
It requires N bytes of space, so if N value is defined at compiled time, we are able to allocate necessary space at compile time.
Now, we have to combine these two algorithms. We just need to use our own Random()
function instead of standard rand()
in unique numbers generation algorithm.
Code
template<unsigned N, unsigned K>
void Generate(Pair (&i_set)[N], unsigned (&o_res)[K])
{
unsigned deck[N];
for (unsigned i = 0; i < N; ++i)
{
deck[i] = i;
}
unsigned max = N - 1;
for (unsigned i = 0; i < K; ++i)
{
const unsigned index = Random(i_set, deck, max + 1);
std::swap(deck[max], deck[index]);
o_res[i] = i_set[deck[max]][0];
--max;
}
}
Usage
int main()
{
srand((unsigned)time(0));
const unsigned c_N = 5; // N
const unsigned c_K = 2; // K
Pair input[c_N] = {{0, 5}, {1, 3}, {2, 2}, {3, 5}, {4, 4}}; // input array
unsigned result[c_K] = {};
const unsigned c_total = 1000000; // number of iterations
unsigned counts[c_N] = {0}; // frequency counters
for (unsigned i = 0; i < c_total; ++i)
{
Generate<c_N, c_K>(input, result);
for (unsigned j = 0; j < c_K; ++j)
{
++counts[result[j]];
}
}
unsigned sumOfWeights = 0;
for (unsigned i = 0; i < c_N; ++i)
{
sumOfWeights += input[i][1];
}
for (unsigned i = 0; i < c_N; ++i)
{
std::cout << (double)counts[i]/c_K/c_total // empirical frequency
<< " | "
<< (double)input[i][1]/sumOfWeights // expected frequency
<< std::endl;
}
return 0;
}
Output
N = 5, K = 2
Frequencies
Empiricical | Expected
0.253813 | 0.263158
0.16584 | 0.157895
0.113878 | 0.105263
0.253582 | 0.263158
0.212888 | 0.210526
Corner case when weights are actually ignored
N = 5, K = 5
Frequencies
Empiricical | Expected
0.2 | 0.263158
0.2 | 0.157895
0.2 | 0.105263
0.2 | 0.263158
0.2 | 0.210526