How to choose sample according to distribution?

Question

I have an array of elements [a_1, a_2, ... a_n] and array ofprobabilities associated with this elements [p_1, p_2, ..., p_n].

I want to choose "k" elements from [a_1,...a_n], k << n, according to probabilities [p_1,p_2,...,p_n].

How can I code it in python? Thank you very much, I am not experienced at programming

Possible duplicate of [Weighted random sample in python](https://stackoverflow.com/questions/13047806/weighted-random-sample-in-python) — Peter O., Oct 22 '19 at 09:22

Fakher Mokadem · Accepted Answer · 2019-10-22T12:20:08.040

1

use numpy.random.choice

example:

from numpy.random import choice

sample_space = np.array([a_1, a_2, ... a_n]) # substitute the a_i's 
discrete_probability_distribution = np.array([p_1, p_2, ..., p_n]) # substitute the p_i's 

# picking N samples
N = 10
for _ in range(N): 
    print(choice(sample_space, discrete_probability_distribution)

edited Oct 22 '19 at 12:20

answered Oct 22 '19 at 09:34

Fakher Mokadem

1,059
1
8
22

You could get the same sample returned twice, I suppose a couple of more lines would be required to take that into account? – Jonas Byström Oct 22 '19 at 11:47
1

The OP did not specify that the samples have to be unique, nor is it a norm that a random sample should have unique values. However you can do that by applying `set` to the result, and generating samples until the set size is equal to `N` – Fakher Mokadem Oct 22 '19 at 12:21

score 0 · Answer 2 · answered Oct 22 '19 at 09:25

0

Perhaps you want something similar to this?

import random
data = ['a', 'b', 'c', 'd']
probabilities = [0.5, 0.1, 0.9, 0.2]
for _ in range(10):
    print([d for d,p in zip(data,probabilities) if p>random.random()])

The above would output something like:

['c']
['c']
['a', 'c']
['a', 'c']
['a', 'c']
[]
['a', 'c']
['c', 'd']
['a', 'c']
['d']

answered Oct 22 '19 at 09:25

Jonas Byström

25,316
23
100
147

my data has around 1, 000, 000 elements. i need to choose something around 100 – Kate Oct 22 '19 at 09:31

How to choose sample according to distribution?

2 Answers2