3

Imagine you have an array of hashes representing competitor and their probability to win the prize (a float between 0 and 1). Like:

  [ {:name => "Adam" , :prob => 0.5}
    {:name => "Ben" , :prob => 1.0}
    {:name => "Chris" , :prob => 0.1}
    {:name => "Daniel" , :prob => 0.2}
    {:name => "Ed" , :prob => 0.7}
    {:name => "Frey" , :prob => 0.5}
    {:name => "Gilbert" , :prob => 0.3}
  ]

I would like to have an algorithm on which I can select three winners using random numbers but respecting the probability of each person.

The total probability of the sample is 3.3

A logical approach would be to calculate the random value like:

val = rand(33)/10.0

And scan the array until I get the person which reaches the random number.

This approach works but it implies a scan in the array.

I wonder if there would be a more straightforward solution. Any ideas?

PS: Imagine the array might have a great amount of elements.

Edu
  • 1,949
  • 1
  • 16
  • 18
  • 1
    I posted my solution below, however, considering you are dealing with probabilities, I think that the sum of all probabilites should be equal to 1. In your example, Ben has 100% of chance of being chosen. – Sam Felix Jan 31 '12 at 01:41
  • Yes, you are right. It is just that I actually work with the concept of quotas and in order to simplify the matter, I changed quota for a 0 to 1 number. – Edu Jan 31 '12 at 08:48

4 Answers4

2

Create a loop which runs till 3 winners are selected. In this loop,generate a particular random number using any random method available in the programming language of your choice. After this,start iterating through the users. If any user's probability is lesser than this random number,accept that user as a winner. If in any iteration of the loop a winner isn't selected,for example,in a case where the lowest probability in your list is 0.2 and the random number generated is 0.1,in that case,continue to the next iteration of the loop. Break out of the loop when you get 3 winners. A probable pseudo code for such could be as follows:

int count=0;
while(count<3){
    temp=GenerateRandomNumber()
    int userIndex= AcceptWinner(UserListProbability,temp)
    //here keep iterating through the users to check which user's probability is less than temp and returns the index of the winner in the List

    if(userIndex==-1)//No winner selected
        continue;
    else{
        count++;
        Print List(userIndex)
    }
}

Note:the list should be sorted

HackCode
  • 1,837
  • 6
  • 35
  • 66
1

I was thinking about this and I think my result makes sense:

  1. sort the vector according to the probability: [a=0.1,b=0.2,c=0.3,d=0.4]
  2. choose a random number (e.g. 0.5)
  3. iterate from the beginning and sum the probability values and stop when it is higher: answer = 0.1 + 0.2 + 0.3. So, 0.6 > 0.5, we value 'c'

my gist over this is that the values in the end of the vector should have a higher probability of being chosen. I have implemented in python:

values = [0.1,0.2,0.3,0.4]
count_values = len(values)*[0]
answer = len(values)*[0]

iterations = 10000 

for i in range(0,iterations):
    rand = float(random.randint(0,iterations))/iterations
    count = 0
    sum = 0
    while sum <= rand and count <= len(values):
        sum += values[count]
        count += 1
    count_values[count-1]+=1

for i in range(0,len(count_values)):
    answer[i] = float(count_values[i])/iterations

and running several times, I calculated the probability of all elements being chosen, that should match our initial probability:

[0.1043, 0.196, 0.307, 0.3927]
[0.1018, 0.2003, 0.2954, 0.4025]
[0.0965, 0.1997, 0.3039, 0.3999]
Sam Felix
  • 1,329
  • 1
  • 10
  • 23
  • I just don't get why sorting the vector? Without sorting, it is the approach I have mentioned at the question. Thank you. – Edu Jan 31 '12 at 08:56
  • because we sum the weights, if they are not sorted, for instance, the highest probability is in the beginning (like 0.6), the smaller ones (like 0.1) will never be chosen. Anyway, test 10 thousand times all the solutions posted hereto see if they are correct, the results must match the probabilities. Complexity here is also O(n) – Sam Felix Jan 31 '12 at 10:13
0

I'm assuming that in your example "probability" means "weight" (so folks with a 1.0 probability aren't guaranteed to win and the total probability won't sum to 1.0)

You could build a tree of nodes where the leaf nodes contained your individual entries:

leaf1 = {:name => "Adam" , :prob => 0.5}
leaf2 = {:name => "Ben" , :prob => 1.0}

and each node contained the sum of the nodes under it

node1 = { :prob_sum => 1.5 , :children=> [ leaf1, leaf2] }

The root node then contains the sum of the whole structure

root_node = { :prob_sum => 33 , :children => [ leaf9, leaf10] }

Then you pick a random number between zero and the sum contained in the root node.

my_random = rand( root_node.prob_sum )

Then traverse the tree. Each node contains the sum of all nodes under it, so if your random number is larger than the node you subtract that node's value and skip that branch.

def find_node( my_random ):
c = children.first()
while( c ):
     if ( c.prob_sum < my_random ):
         return c.find_node(my_random)
     my_random -= c.prob_sum
     c = c.next

Assuming you've built a balanced tree you should get results in O(log n)

Alternatively, you could get the same result by adding a running total field to your current data set and doing a binary search based on that running total. That would probably be easier to implement, but would only be applicable if your working set could fit in memory.

Vorlauf
  • 69
  • 5
  • I actually like a lot your second approach of adding the running total and doing the binary search because at the system, the "weight/probability" for every person changes over time and I would need to rebuild the tree. With the ordered table I just need to keep them in order. Thank you. – Edu Jan 31 '12 at 08:54
0

There is also the approach which is working today but has some problems.

Today I create an array and put the probability*100 entries for each person in that array.

Then it is possible to do a random directly to the array content.

First problem is that it is costly on every aspect (memory, processing, ...) and it does not scale.

The second problem I have when choosing the second and third person once either I take the first out or I do a loop with random until a different person is picked up.

Nevertheless, for small datasets (like I have so far but will increase along time), this solution works fine.

Edu
  • 1,949
  • 1
  • 16
  • 18