I had come across the following code while reading up about RL. The probs vector contains the probabilities of each action to be taken. And I believe the given loop tries to choose an action randomly from the given distribution. Why/How does this work?
a = 0
rand_select = np.random.rand()
while True:
rand_select -= probs[a]
if rand_select < 0 or a + 1 == n_actions:
break
a += 1
actions = a
After going through similar code, I realised that "actions" contains the final action to be taken.