0

I am trying to figure out an efficient way to sample indices from an array of normalized frequencies. In essence, I have a massive amount of data and hence storing the data in a table with an element per occurrence (with repeated elements allowed) is not possible.

As a small example to illustrate what I am trying to do. Suppose I have the following array in Python:

freqs = [.2, .1, .1, .3, .3]

Now, what I want is to be able to basically produce an integer that indexes into the array above by sampling values between 0 and 4 that follows the distribution associated with each index position. That is, if 100 indices are sampled, I would like 20% of them to be 0 (on average).

ClownInTheMoon
  • 225
  • 1
  • 9

1 Answers1

0

you can check this answer for almost the same question.

you just need to define the numbers and their frequencies.

if you are using python >= 3.6 you can use out-of-the-box feature

from random import choices
indices = [1,2,3]
probs = [.3, .3, .4]
def get_rand_choice():
    return choices(indices, probs)
Omer Shacham
  • 618
  • 4
  • 11