Created a Sample from Weighted Random Choice

Question

I want to create a sample of 3 choices from a given dictionary. The dictionary length can be variable.

What I have done in previous code is to create a dictionary of weighted values, in this case 12 values and keys.

Cannot retrieve the sample from my random.choice though.

Using python 3

My dictionary is

dictionary = {'Three': 14.4, 'Five': 11.2, 'Two': 14.4, 'Thirteen': 3.3, 'One': 17.6, 'Seven': 3.3, 'Nine': 3.3, 'Ten': 3.3, 'Twelve': 3.3, 'Eight': 3.3, 'Four': 12.0, 'Six': 10.4}

I try to retrieve a sample of 3 form the random choice of dictionary.

my_sample = random.sample(random.choice(dictionary), 3)
print(my_sample)

But get this error

Traceback (most recent call last):
  File "c_weights.py", line 38, in <module>
    my_sample = random.sample(random.choice(dictionary), 3)
  File "/usr/lib64/python3.3/random.py", line 252, in choice
    return seq[i]
KeyError: 11

Trying to get

My_sample = ('One', 'Four','Twelve') for example.

Edit: Just to be clear what I am working towards is.

('One', 'Four','Twelve')
('Two', 'One','Six')
('Four', 'Two','Five')
('One', 'Eight','Two')
('Thirteen', 'Three','Six')

So unique sets built upon weighted probability from within the dictionary(or tuple if that is better)

I don't get the weighted part of this. Do you want `"Three"` to be a member of the sample much more often than `"Thirteen"`? Neither `random.sample` nor `random.choice` will do this, but that's what people are usually after when they say "weighted random choice". — DSM, Nov 09 '13 at 22:39
I don't see the logic of your weighted randomness. There are other ways to do this. [Here](http://stackoverflow.com/questions/19871608/generating-weighted-random-numbers) is one way with numpy. I personally use [this way](http://stackoverflow.com/a/14992686/377366). — KobeJohn, Nov 09 '13 at 22:40
@DSM Yes i want 'One' to be drawn out of the sampl proportionately more than 'Thirteen' byt the weightings i have provided. — sayth, Nov 09 '13 at 22:59
@kobejohn so instead of creating a dictionary I should be creating a tuple? I thought a dicitonary was better as the keys are the important part I wish to retrieve sets of. — sayth, Nov 09 '13 at 23:02

score 2 · Answer 1 · answered Nov 09 '13 at 22:42

2

You can't successfully apply random.choice() to a dictionary - it's a function for sequences, not for mappings.

Try:

random.sample(dictionary, 3)

That returns a list containing 3 random keys from the dict.

answered Nov 09 '13 at 22:42

Tim Peters

67,464
13
126
132

But will it use the weightings I have included in pulling the sample ? – sayth Nov 09 '13 at 22:56
1

@sayth, of course not. See the other comments on your question for approaches to that. But it remains unclear what you want to do. A sample of size *one* from a weighted population makes good sense. But that you're trying `random.sample()` at all implies you want no duplicates, and then it's clear as mud what you want for a sample of size 3 from a weighted population. – Tim Peters Nov 09 '13 at 22:58
Yes I need the sets to be unique. As I want to have multiple return sets from the same sample all unique but reflecting the weighted probability. – sayth Nov 09 '13 at 23:04
Suppose your dict were `{'Three': 10000000, 'Two': 1}`, and you ask for a sample of size 2. The only possible sample of that size is `['Three', 'Two']`, but that has nothing to do with the weights. That's why I say that what you want is "clear as mud". Please edit your original question to spell out *exactly* what you do want :-) – Tim Peters Nov 09 '13 at 23:07
edited. the sample size will always be between 8 - 30, and the sets size of 3. – sayth Nov 09 '13 at 23:13
1

Suppose (to make the numbers simpler) your dict were {'One': 1, 'Two': 10, 'Three': 100} and you were taking a sample of size 2. Exactly what should the probability be of sampling `('One', 'Two')`? Of `('One', 'Three')`? Of `('Two', 'Three')`? – Karl Knechtel Nov 09 '13 at 23:24
@Karl Knechtel. Assuming total sum of numbers in array is 100(for 100%). The chance of pulling 'one' out should reflect it's weighting % over 100 runs. Note my sample size would always be greater than 8. – sayth Nov 10 '13 at 00:02
1

@sayth, what you want is impossible. Say you have 3 items and the weights are 1 for `A`, 2 for `B` and 7 for `C` (so add to 10). Say you want samples of size 2. What *exactly* do you want for the probabilities of selecting `(A, B)`, `(A, C)` and `(B, C)`? No matter what you answer, the relatively frequencies of `A`, `B` and `C` in the samples won't be in the ratio `1::2::7`. Think about it. The logical inconsistencies don't go away just because you're looking at larger populations and larger sample sizes - they just get harder to *see* then. – Tim Peters Nov 10 '13 at 00:38
@TimPeters Its possible my maths is wrong its been a long time. But separating the example to say a lottery of 40 numbers statistically in a six draw the probability is 1/40 * 1/39 * 1/38 * 1/37 * 1/36 * 1/35 * 1 /34. Now say that we physically weighted the balls so that numbers 1 to 6 where known to be more likely to come out. So then realistically the chance of No.1 coming out isn't 1/40 as the ball is weighted to beat its probability. Its that basis I am working on. – sayth Nov 10 '13 at 01:06
So in my example 'One' has 17% chance of being pulled out more than any other number. so if I pulled out 100 sets I would expect 'One' to be the first number 17 times. – sayth Nov 10 '13 at 01:07
2

So in the specific tiny example I gave you, you want `C` to be picked first 70% of the time. Given `C`, you want `A` 1/3rd of the time and `B` the rest ... etc. Add those all up, and the probabilities are 17/360 for `AB` (in either order), 14/45 for `AC` and 77/120 for `BC`. So we'll see `A` in the sample 17/360 + 14/45 = 43/120 ~= 35.8% of the time, a long way from 10%. Just so you know that's what you'll get ;-) – Tim Peters Nov 10 '13 at 01:29

KobeJohn · Accepted Answer · 2013-11-10T11:01:11.890

Okay this is probably full of bugs / statistical wrongness, but it's a starting point for you and I don't have more time for now. It's also very inefficient! That having been said, I hope it helps:

import random

d= {'Three': 14.4, 'Five': 11.2, 'Two': 14.4, 'Thirteen': 3.3, 'One': 17.6, 'Seven': 3.3, 'Nine': 3.3, 'Ten': 3.3, 'Twelve': 3.3, 'Eight': 3.3, 'Four': 12.0, 'Six': 10.4}
total_weight = sum(d.values())
n_items = 3
random_sample = list()
d_mod = dict(d)

for i in range(n_items):
    random_cumulative_weight = random.uniform(0, total_weight)
    this_sum = 0.0
    for item, weight in d_mod.items():
        this_sum += weight
        if this_sum >= random_cumulative_weight:
            random_sample.append(item)
            break
    del(d_mod[item])
    total_weight -= this_sum

random_sample

yields ['Seven', 'Nine', 'Two'] etc.

Created a Sample from Weighted Random Choice

2 Answers2