0

I had come across the following code while reading up about RL. The probs vector contains the probabilities of each action to be taken. And I believe the given loop tries to choose an action randomly from the given distribution. Why/How does this work?

  a = 0
  rand_select = np.random.rand()
  while True:
      rand_select -= probs[a]
      if rand_select < 0 or a + 1 == n_actions:
          break
      a += 1
      actions = a

After going through similar code, I realised that "actions" contains the final action to be taken.

Chintan Rajvir
  • 689
  • 6
  • 20
  • 1
    If the CDF is invertible, you can sample uniformly from the inverted CDF. If not, you can use rejection sampling as usual. It's not that different to continuous distribution sampling. Formally you replace the integrals with sums. – Jan Christoph Terasa Jun 23 '20 at 07:48

1 Answers1

0

You can view the probabilities as a distribution of contiguous parts on the line from 0.0 to 1.0.

if we have A: 0.2, B: 0.3, C: 0.5 to the line could be

0.0 --A--> 0.2
0.2 --B--> 0.5
0.5 --C--> 1.0

And in total 1.0.

The algorithm is choosing a random location between 0.0->1.0 and finds out where it "landed" (A, B or C) by sequentially ruling out parts.

Suppose we draw 0.73, We can "visualize" it like this (selection marked with *):

0.0 ---------------------------> 1.0
                          *
0.0 --A--> 0.2 --B--> 0.5 --C--> 1.0

0.73 - 0.2 > 0 so we reduce 0.2 (=0.53) and are left with:

0.2 --B--> 0.5
0.5 --C--> 1.0

0.53 - 0.3 > 0 so we reduce 0.5 (=0.23) and are left with:

0.5 --C--> 1.0

0.23 - 0.5 < 0 so we know the part we drew was C.

The selection distributes the same as the probabilities and the algorithm is O(n) where n is the number of probabilities.

Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88