0

There is a given distribution p with finite states, and the distribution is p={p_1, p_2, ..., p_n}. Each p_i means the probability of occurrence of state i.

According to this distribution, I tend to sample, alternatively, predict what the next state will be. To my best knowledge, this problem can be implemented with the roulette method. The code for roulette method follows:

int Sample(int * p, int n){
    for(int i = 1; i < n; i++){
        p[i] += p[i - 1];//Accumulate probability
    }
    double ran_val = (double) rand() / RAND_MAX;//Generate random number between [0,1]
    int index = 0;
    for(; index < n; index++){
        if(ran_val < p[index]){
            break;
        }
    }
    return index;
}

We first accumulate the probability from the second to the end as the top 4 lines show, and then random a val between [0,1] (We assume the $\sum{p_i}=1$). In the next step, we check which interval the val falls into, and return the corresponding interval index (or the state number).

For example, the given distribution is {0.1, 0.2, 0.3, 0.4} for 4 states 1, 2, 3, 4. After the accumulation, it comes to {0.1, 0.1+0.2=0.3, 0.3+0.3=0.6, 0.6+0.4=1.0}. Once we random a val, such as 0.5. The program will check the interval [0, 0.1), then the [0.1, 0.3), and the [0.3, 0.6) and so on to determine which val will fall into. In this case, the state 3 will be the answer.

This method is the best implementation strategy as I know. The above code implements with two round traverse of array p (certainly it can be simplified within only one round). However, is there any other methods to do the random sample task that better than the above, maybe less time complexity or more interpretable? If you have ideas, could you give some references or links?

MinG
  • 57
  • 9
  • possible duplicate of [Generate an array of random integers with non-uniform distribution](http://stackoverflow.com/questions/21142031/generate-an-array-of-random-integers-with-non-uniform-distribution) – pjs Apr 02 '15 at 15:37
  • You can 1) construct the CDF once and table it, and 2) binary search through the table to generate values. Or you can use the [alias method](http://en.wikipedia.org/wiki/Alias_method). – pjs Apr 02 '15 at 15:37
  • To the first comment. Not exactly. I want to find a better solution when I come across bottleneck with the above code. Not just a solution. – MinG Apr 02 '15 at 15:56
  • To the second one. The above code has the O(n) time complexity. I don't want to involve sort algorithm cause it costs more. The links you posted provide some directions. Thanks for your advice, that's very helpful. – MinG Apr 02 '15 at 16:02
  • Who said anything about sorting? The alias method builds a lookup table in O(n) time, a cost you're already invoking, and then each use is O(1) for an amortized O(1) cost. – pjs Apr 02 '15 at 16:48
  • Since when is binary searching through your array *not* a better solution than linear searching through the array? – pjs Apr 02 '15 at 16:49

0 Answers0