0

I have a list of elements, for example L = [A, B, C]. Each element has an associated score for example S = [5, 1, 4].

I want to select an element from L according to its score in S, simply by generating a sort of cumulative probability distribution, where each element L[i] corresponds to an interval in (0,1] proportional to S[i]. Then, a random number drawn in (0,1] maps to the selected element.

For the example given before, we can have the scores of S represented as probabilities by normalizing on 5+1+4 so we get SS = [0.5, 0.1, 0.4], and we map the elements of L to intervals, such that:

B is mapped to (0, 0.1]
C is mapped to (0.1, 0.1+0.4]
A is mapped to (0.1+0.4, 0.5+0.5]

Now if I generate a random number r in (0,1] (e.g. r = random.random()), it will map to the corresponding element. For example if r = 0.03 we know that the element is B. And for instance, if r = 0.73 we know that the element is A ...

Is there a simple way in python to do this mapping between an element and an interval ?

I know that I can use numpy.cumsum to generate the cumulative sum of SS, but how to map elements to intervals obtained from this cumulative sum ?

shn
  • 5,116
  • 9
  • 34
  • 62
  • Just to clarify: You want to have a list such that when a random number is generated, a corresponding element of the list is returned? – wnnmaw Jan 29 '14 at 17:30
  • Let me check if I understand correctly. You want to generate arbitrary discrete random variates defined by a cdf from a uniform distribution? – Joel Cornett Jan 29 '14 at 17:47
  • @JoelCornett no that is not what I want to do. I just want to create a CDF based on a list of elements and a scores/probabilities. So that we can select randomly an element according to that CDF. – shn Jan 29 '14 at 19:48

2 Answers2

2

Assuming I understand you correctly, I would recommend a dict:

def branchFinder(L, S, r):
    S, L = map(list, zip(*sorted(zip(S, L))))
    SS = [0.] + map(lambda x: x/10., S)

    probs = {}
    for i in range(len(L)):
        probs[L[i]] = (sum(SS[:i+1]), SS[i+1] + sum(SS[:i+1]))        

    for key, value in probs.iteritems():
        if value[0] < r <= value[1]:
             return key
shn
  • 5,116
  • 9
  • 34
  • 62
wnnmaw
  • 5,444
  • 3
  • 38
  • 63
  • If the values are ordered, (as they would be in a CDF) there is no need to do the first less than comparison. – Joel Cornett Jan 29 '14 at 17:45
  • 1
    @JoelCornett the values might be ordered *into* the dictionary, but not necessarily out of it! – jonrsharpe Jan 29 '14 at 17:49
  • I'm not sure which value you're referring to. If you mean the values of ```value``` then you must do both comparisons because they're in a ```dict``` which is unordered by its nature – wnnmaw Jan 29 '14 at 17:49
  • @jonrsharpe: Which would imply that a `dict` might not be the best way to store the CDF ;) – Joel Cornett Jan 29 '14 at 17:51
  • How do you construct the dict `probs` from `L` and `SS` ? – shn Jan 29 '14 at 19:39
  • @shn, I'll update on how that's done, give me a minute – wnnmaw Jan 29 '14 at 19:40
  • @shn, I've updated it so the function now takes ```L``` and ```S``` as its arguments, then generates ```probs``` based on those. Let me know if this is what you want – wnnmaw Jan 29 '14 at 20:08
  • I get an error. for key, value in probs: ValueError: need more than 1 value to unpack – shn Jan 29 '14 at 21:05
  • 1
    It is ok, fixed using: `for key, value in probs.iteritems()`. Thanks. – shn Jan 29 '14 at 21:11
2

You could use a Counter:

from collections import Counter
counts = Counter(dict(zip(L, S)))

Giving, e.g.:

Counter({'A': 5, 'C': 4, 'B': 1})

Then use this answer to randomly select from it:

from random import randrange
from itertools import islice

index = randrange(sum(counts.values()))
next(islice(counts.elements(), index, None))

or just choose from the list (perhaps clearer what's happening, but less efficient with big lists):

from random import choice

choice(list(counts.elements()))

This doesn't answer your exact question (generating probabilities), but hopefully gets you the result you need (random selection based on score).

Community
  • 1
  • 1
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437