16

I have data like this:

d = (
  (701, 1, 0.2),
  (701, 2, 0.3),
  (701, 3, 0.5),
  (702, 1, 0.2),
  (702, 2, 0.3),
  (703, 3, 0.5)
)

Where (701, 1, 0.2) = (id1, id2, priority)

Is there a pretty way to choose id2 if I know id1, using priority?

Func(701) should return:
  1 - in 20% cases
  2 - 30%
  3 - 50%

Percent will be rough of course

Peter O.
  • 32,158
  • 14
  • 82
  • 96
  • 3
    what do you have so far? – SilentGhost Jan 15 '10 at 16:46
  • The priorities for 702 and 703 don't add up to 1. What happens to the other 50% of the time for 703 when we shouldn' return 3? What do we return? – MAK Jan 15 '10 at 16:53
  • 1
    Can you clarify: is what you call "priority" the probability with which you want to return id2 given id1, or is it an unnormalized weight? It seems to be the latter since your weights for 702 only add up to 0.5, but I am afraid that 703 is a typo for 702. – Imran Jan 15 '10 at 16:58
  • probabilities (priorities) is not normalized, sum of all priority not equal 1, so 703 is not typo –  Jan 17 '10 at 13:27

7 Answers7

7

Generate a Cumulative Distribution Function for each ID1 thus:

cdfs = defaultdict()
for id1,id2,val in d:
    prevtotal = cdfs[id1][-1][0]
    newtotal = prevtotal + val
    cdfs[id1].append( (newtotal,id2) )

So you will have

cdfs = { 701 : [ (0.2,1), (0.5,2), (1.0,3) ], 
         702 : [ (0.2,1), (0.5,2) ],
         703 : [ (0.5,3) ] }

Then generate a random number and search for it in the list.

def func(id1):
    max = cdfs[id1][-1][0]
    rand = random.random()*max
    for upper,id2 in cdfs[id1]:
        if upper>rand:
            return id2
    return None
Phil H
  • 19,928
  • 7
  • 68
  • 105
  • The last two lines -- `else: return None` should be removed. It will stop the iteration of the for loop unless the rand value is below the first item in the list. – Doug Harris Apr 12 '12 at 17:46
3

Realizing that my first answer was quite buggy in its math, I have produced a new idea. I believe the algorithm here is similar to that of several of the other answers, but this implementation seems to qualify for the "pretty" (if that equals simple) requirement of the question:

def func(id):
    rnd = random()
    sum = 0
    for row in d:
        if row[0] == id:
            sum = sum + row[2]
            if rnd < sum:
                return row[1]

With the example data from the OP it goes like this:

  • Pick a random number between 0 and 1.0
  • If the number is < 0.2 return the first element
  • Else if the number is < 0.5 return the second element
  • Else (if the number is < 1.0) return the third element
Jørn Schou-Rode
  • 37,718
  • 15
  • 88
  • 122
2

Use a discrete uniform distribution from the random module over a sufficient number of values, then partition it:

For example, for case 701 use a distribution over 10 values, for 2 values return 1, for another 3, return 2, and for the other 5, return 3.

You can build any distribution using enough uniform distributions :)

James
  • 24,676
  • 13
  • 84
  • 130
1

If your percent values will not be more precise than whole percent values, use a random number generator to generate a number 0-99.

Then in your function, use (programmatic) cases to choose the correct number. For example (clean this up):

if 701
  if random_num < 20
    return 1
  else if random number < 50   // ( 20 + 30 )
    return 2
  else if random number < 100  // ( 20 + 30 + 50 )
    return 3
  else
    // error
Aaron
  • 6,988
  • 4
  • 31
  • 48
1

A very quick hack:

import random

d = {
    701: [(1,0.2),(2,0.3),(3,0.5)],
    702: [(1,0.2),(2,0.3),(3,0.5)]
}

def func(value):
    possible_values=d[value]
    total=sum(p[-1] for p in possible_values)
    random_value=random.random()
    prob=possible_values[0][-1]/total
    index=1
    while index<len(possible_values) and prob<random_value:
        prob+=possible_values[index][-1]/total
        index+=1
    return possible_values[index-1][0]

if __name__=='__main__':
    testcases=1000
    cnt=[0,0,0]
    for case in xrange(testcases):
        answer=func(701)
        cnt[answer-1]+=1
    for i in xrange(3):
        print "Got %d %f%% of the time"%(i+1,float(cnt[i])/testcases*100)

It isn't pretty, but it is the first thing that came to mind, and appears to work as expected.

What this does is to get a random value in the interval [0,1) (using random.random()). It then uses whether the random value falls in the intervals [0,0.2),[0.2,0.5) or [0.5,1), to figure out which value to return.

MAK
  • 26,140
  • 11
  • 55
  • 86
0

Two ideas (allow me to illustrate it with separated options and ratios for the sake of clarity in the argument names, if they're packed in a tuple you can save the "zip"):

a) Denormalize the weights to get integer ratios, then put in a list as many copies as the ratio and use random.choice.

def choice_with_ratios(options, ratios):
    tmp = sum([[v]*n for v, n in zip(options, ratios)], [])
    return random.choice(tmp)

b) Use the normalized weights and start summing up until you reach a random generated uniform value

def choice_with_weights(options, weights):
    s = 0
    r = random.random()
    for v, w in zip(options, weights):
        s += w
        if s >= r: break
    return v

By the way, if the first field is used as a key, you should have it in a dictionary, like:

d = {
  701: ((1, 0.2), (2, 0.3), (3, 0.5),
  702: ((1, 0.3), (2, 0.2), (3, 0.5)
}
fortran
  • 74,053
  • 25
  • 135
  • 175
0

You can also create a 100-element list for each value, and then let random.choice do the selecting from a seeded list whose members are loaded in the weighting that you want:

import random
from collections import defaultdict

d = ( 
  (701, 1, 0.2), 
  (701, 2, 0.3), 
  (701, 3, 0.5), 
  (702, 1, 0.2), 
  (702, 2, 0.3), 
  (702, 3, 0.5) 
) 

class WeightedLookup(object):
    def __init__(self, valueTupleList):
        self.valdict = defaultdict(list)
        for key, val, prob in valueTupleList:
            self.valdict[key] += [val]*(int)(prob*100)

    def __getitem__(self,key):
        return random.choice(self.valdict[key])


lookup = WeightedLookup(d)

# test out our lookup distribution, sample it 100000 times
res = { 1:0, 2:0, 3:0 }
for i in range(100000):
    res[lookup[701]] += 1

# print how many times each value was returned
for k in (1,2,3):
    print k, res[k]

Prints:

1 20059
2 30084
3 49857
PaulMcG
  • 62,419
  • 16
  • 94
  • 130