0

I would like to choose randomly out of a list with 3 elements (HGA, CGA, SGA), but I have 3 lists with the probabilities in it.

My probabilities are given by (the lists have the same length):

Probs = { 'HGA':prob['HGA'], 'CGA':prob['CGA'], 'SGA':prob['SGA'] }

with prob looking like this:

prob['HGA']=[0.5,0.2,0.4,0.6, ...]

and now I want to create another list which should look something like this without using a loop:

particles = ['HGA', 'CGA', 'CGA', 'CGA', 'SGA' ...]

The length of 'particles' should obviously have the same length as the probabilities.

2 Answers2

0

Assuming Probs indicates the probability to select each key (with the sum of values being 1) you can use numpy.random.choice:

Probs = {'HGA':0.1, 'CGA':0.2, 'SGA':0.7}

import numpy as np
particles = np.random.choice(list(Probs), p=list(Probs.values()), size=100)

output:

array(['SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'HGA', 'HGA',
       'HGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'CGA',
       'SGA', 'CGA', 'SGA', 'SGA', 'SGA', 'CGA', 'SGA', 'SGA', 'SGA',
       'HGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'SGA',
       'HGA', 'SGA', 'SGA', 'CGA', 'CGA', 'SGA', 'SGA', 'SGA', 'SGA',
       'SGA', 'CGA', 'CGA', 'CGA', 'SGA', 'CGA', 'SGA', 'CGA', 'CGA',
       'CGA', 'SGA', 'CGA', 'CGA', 'SGA', 'SGA', 'HGA', 'SGA', 'HGA',
       'SGA', 'SGA', 'SGA', 'SGA', 'SGA', 'HGA', 'CGA', 'CGA', 'CGA',
       'CGA', 'SGA', 'SGA', 'HGA', 'SGA', 'SGA', 'CGA', 'SGA', 'HGA',
       'SGA', 'SGA', 'SGA', 'SGA', 'CGA', 'SGA', 'CGA', 'CGA', 'SGA',
       'HGA', 'SGA', 'HGA', 'SGA', 'CGA', 'SGA', 'SGA', 'CGA', 'SGA',
       'SGA'], dtype='<U3')

For a list, use:

particles = (np.random.choice(list(Probs), p=list(Probs.values()), size=100)
               .tolist()
             )
mozway
  • 194,879
  • 13
  • 39
  • 75
0

If I understood correctly, the i-th element in the probability lists represents the probability of sampling the corresponding item at the i-th step. Meaning that summing the i-th items of all the lists should always give a total of 1. If yes, this should be what you are asking for. I made a toy example:

import numpy as np

Probs = {'HGA':[0.2, 0.6, 0.2], 'CGA':[0.7, 0.1, 0.3], 'SGA':[0.1, 0.3, 0.5]}
values = list(Probs.keys())

particles = [np.random.choice(values, p=sample_probs) for sample_probs in zip(*Probs.values())]

# ['CGA', 'HGA', 'HGA']
print(particles)

For a fast vectorized version, following this excellent answer:

def vectorized_choice(p, n, items=None):
    s = p.cumsum(axis=1)
    r = np.random.rand(p.shape[0], n, 1)
    q = np.expand_dims(s, 1) >= r
    k = q.argmax(axis=-1)
    if items is not None:
        k = np.asarray(items)[k]
    return k

p = np.column_stack(tuple(Probs.values()))
n = 1
items = list(Probs.keys())

sample = vectorized_choice(p, n, items)
user2246849
  • 4,217
  • 1
  • 12
  • 16
  • Yes, but the probability lists have a lot more characters and I am looking for a solution without using a for loop... Else the code gets super slow because of the number of digits – Silmarilon Apr 30 '22 at 19:11
  • Ok, if what I wrote is correct, I suggest you to check out [this excellent answer](https://stackoverflow.com/a/57238866/2246849) for a fast vectorized implementation. Of course, in your case you have `p = np.asarray(list(zip(*Probs.values())))`, `n = 1` and `items = list(Probs.keys())`. – user2246849 Apr 30 '22 at 19:24
  • @MichiKovac I edited my answer to reflect this. – user2246849 Apr 30 '22 at 19:28