1

I have a list of weighted objects i.e.:

A->1 B->1 C->3 D->2 E->3

is there an efficient algorithm in C++ to pick random elements according to their weight?

For example The possibility that element A or B with a lower weighting is picked is higher (30%) than the possibility that the algorithm selects elements C E (10%) or D (20%)

ElPatzo
  • 545
  • 1
  • 6
  • 25
  • 1
    The answer to http://stackoverflow.com/questions/6052603 should solve this. You may need to normalise your weights by dividing each weight by the total sum of all weights. – Peter de Rivaz May 25 '13 at 12:35
  • 1
    You don't really need to normalize. just make the random range goes [0,total weight) instead of [0,1) – Apiwat Chantawibul May 25 '13 at 12:38
  • thanks! but if I understand aright, this algorithm works only if the weighting is "normal": pick a element with a higher possibility if it's weighting is high. – ElPatzo May 25 '13 at 12:52
  • If the weights were 1,1,100,7,100, would the probabilities be the same? (Meaning the actual values play no role, only the order.) Regardless, I think we need a bit more details into exact you want the probabilities to be distributed. – Bernhard Barker May 25 '13 at 13:10
  • I think it should work with this method if i simply invert the values: A->1/1 B->1/1 C->1/3 D->1/2 E->1/3 .. @Dukeling no: if the weights were 1,1,100,7,100 the probabilities would also change – ElPatzo May 25 '13 at 13:19
  • @user1086229 Summing those up gives 19/6. I assume you then normalize by dividing those inverted weights by the sum. That gives P(A)=P(B)=0.315, P(C)=P(E)=0.105 and P(D)=0.158. – Imre Kerr May 25 '13 at 14:28

1 Answers1

2

As @Dukeling said, we need more info. Like how you interpret and use the selection chance.

At least in the field of evolutionary algorithm, fitness scaling (or selection chance scaling) is a sizable topic.

Suppose you start with badness score

B[i] = how badly you don't want to select the i-th item

And the objective is to calculate fitness/selection score S[i] which I assume you are to use it in roulette wheel fashion.

As you say, one obvious way is to use multiplicative inverse:

S[i] = 1 / B[i]

However, there might be a little problem with that. The the same amount of change in B[i] with low value has so much more impact than the same amount of change when B[i] already has high value.

Ask yourself this:

Say
B[1] = 1     ->     S[1] = 1
B[2] = 2     ->     S[2] = 0.5
So item 1 is twice times as likely to be selected compared to item 2

But with the same amount of change
B[3] = 1000  ->     S[3] = 0.001
B[4] = 1001  ->     S[4] = 0.000999001
Item 3 is only 1.001 times as likely to be selected compared to item 4

I'll just throw one possible alternative scheme here for now.

S[i] = max(B) - B[i] + 1

The + 1 part helps so no item has zero chance to be selected.

This ends the part of calculating selection score.


Next, let's clear up how to use the selection score in roulette wheel fashion. Assume we decided to use the additive inverse scheme.

B[1] = 1     ->     S[1] = 1001
B[2] = 2     ->     S[2] = 1000
B[3] = 1000  ->     S[3] = 2
B[4] = 1001  ->     S[4] = 1

Then imagine each point in the score is correspond to a lottery ticket. Let's assign the ticket a running IDs.

| Item | Score = #ticket |   ticket ID  |         win chance       |
|   1  |      1001       | 0    to 1000 |  1001/2004 ~ 0.499500998 |
|   2  |      1000       | 1001 to 2000 |  1000/2004 ~ 0.499001996 |
|   3  |         2       | 2001 to 2002 |     2/2004 ~ 0.000998004 |
|   4  |         1       | 2003 to 2003 |     1/2004 ~ 0.000499002 |

There are 2004 tickets in total.

To do a selection, pick the winning ticket ID at random i.e. the random range is [0,2004). Binary search can be used to quickly look up which item owns the winning ticket as you have already seen in this question. What needs to be looked up with binary search are the boundary values of ticket ID which are 1001,2001,2003 rather than the score themselves.


For comparison, here is the selection chance in case the multiplicative inverse scheme is used.

| Item |                    win chance         |
|   1  |           1/1.501999001 ~ 0.665779404 |
|   2  |         0.5/1.501999001 ~ 0.332889702 |
|   3  |       0.001/1.501999001 ~ 0.000665779 |
|   4  | 0.000999001/1.501999001 ~ 0.000665114 |

You can notice that in the additive inverse scheme, 1 unit of badness consistently corresponds to around a difference of 0.0005 in selection chance.

Whereas in multiplicative inverse scheme, 1 unit of badness results in varying difference of selection chance.

Community
  • 1
  • 1
Apiwat Chantawibul
  • 1,271
  • 1
  • 10
  • 20
  • Hello! Thank you for your answer! I think that is exactly what I am seeking for. to continue with your example, we would have values like S[1] = 1001, S[2] = 1000, S[3] = 2, S[4] = 1 Now we use the algorithm shown in: http://stackoverflow.com/questions/6052603 And now i select a random number in the range [0,max(B)=1001). Isn't there the probabity much higher that B[2] is picked than B[1]? – ElPatzo May 27 '13 at 19:53
  • Sorry: the possibility that B[3] gets picked that B[4] is in this case higher (upper_bound) – ElPatzo May 27 '13 at 20:05
  • @user1086229 From your comment, I can see you have misunderstood/misapplied the selection score. Allow me to elaborate in edit. – Apiwat Chantawibul May 27 '13 at 21:57