I am trying to figure out an efficient way to sample indices from an array of normalized frequencies. In essence, I have a massive amount of data and hence storing the data in a table with an element per occurrence (with repeated elements allowed) is not possible.
As a small example to illustrate what I am trying to do. Suppose I have the following array in Python:
freqs = [.2, .1, .1, .3, .3]
Now, what I want is to be able to basically produce an integer that indexes into the array above by sampling values between 0 and 4 that follows the distribution associated with each index position. That is, if 100 indices are sampled, I would like 20% of them to be 0 (on average).