2

I have a set S={a1,a2,a3,a4,a5,......,an}. The probability with which each of the element is selected is {p1,p2,p3,p4,p5,...,pn} respectively (where ofcourse p1+p2+p3+p4+p5+....+pn=1}.

I want to simulate an experiment which does that. However I wish to do that without any libraries (i.e from first principles)

I'm using the following method: 1) I map the elements on the real number line as follows X(a1)=1; X(a2)=2; X(a3)=3; X(a4)=4; X(a5)=5;....,X(an)=n

2) Then I calculate the cumulative probability distribution function for each coordinate (i.e P(x < X) as follows: cdf(x)= P(a1) + P(a2) + .....P(ai) such that X(ai) <= x < X(a(i+1))

(thus the cdf is a step function)

3) I randomly select an real number,q between (0,1). And calculate the x-coordinate where the line y = q intersects the cdf. Since the cdf is a step function with jumps at 1,2,...n the point would have an integer x-coordinate btw 1 and n. Let the x-coordinate be m.

4) I select that ai, such that X(ai) = m.

My question is does this method simulate the experiment without any bias?

I'm not getting the required results, which is why i'm a bit skeptical.

Any help will be greatly appreciated! Thanks!

leppie
  • 115,091
  • 17
  • 196
  • 297
Adwaitvedant
  • 253
  • 3
  • 10
  • Perhaps see [this question](http://stackoverflow.com/questions/352670/weighted-random-selection-with-and-without-replacement), the answer to which cites [this page](http://prxq.wordpress.com/2006/04/17/the-alias-method/)? – Jason Morgan Jun 23 '12 at 01:36

1 Answers1

3

The logic sounds ok. Generally to sample an arbitrary distribution function Y(x) from a uniform distribution U(0,1), just lookup the uniform random value u in the Y vector and return the least value of x with Y(x) greater than or equal to u i.e. min{x:Y(x)>=u}.

You may want to add an x=0 observation for the base probability as in the example below.

x      P(x)    Y(x)
0      0       0
1      0.1     0.1
2      0.3     0.4
3      0.4     0.8
4      0.2     1

eg u = 0.3 -> x = 2, u = 0.81 -> x = 4, etc.

Clearly calculating relative frequencies over many trials will give unbiased estimates of P(x).

lori_m
  • 5,487
  • 1
  • 18
  • 29