4

So I've got a list of weighted items, and I'd like to pick 4 non-duplicate items from this list.

Item     Weight
Apple     5
Banana    7
Cherry    12
...
Orange    8
Pineapple 50

What is the most efficient way to do this? My initial foray was to just reroll for the subsequent picks if an already picked item came up... but for a small list this can result in a ton of rerolls.

Edit for clarification: For the above example, and ignoring fruits D through N, there is a total weight of 82. So the chances of being picked first are: A ~6% B ~8.5% C ~14.6% O ~9.8% P ~61% Once an items is picked the probabilities would (should!) change.

aslum
  • 11,774
  • 16
  • 49
  • 70
  • What do the weights represent? The probability with which the element must be picked? You don't say. It appears from your question that the weights are just irrelevant… are they? – ShreevatsaR Jun 23 '11 at 18:29
  • How does the weight affect your picking algorithm, exactly? – Mike Atlas Jun 23 '11 at 18:29
  • 1
    What means "non-duplicate"? I guesss items with different weights or names, or both. Yes? What 4 none-duplicate elements do you want to pick? Maybe elements with highest weight? – Mihran Hovsepyan Jun 23 '11 at 18:31
  • Yes, weight is probably of being picked. If A,B,C & O were weight 1, and P was weighted 4, then there would be a 50% chance of P being picked and 12.5% for each of the other four choices. The idea being I can add more items to the list, or increase/decrease individual weights on the fly without having to change any other entries. – aslum Jun 23 '11 at 18:32
  • @Mihran: I don't want to pick the same item twice. So in the above example at the end all but one of the items would be picked (if we're ignoring Dragonfruit through Nectarines). – aslum Jun 23 '11 at 18:35
  • @aslum - I don't think the weights are going to be able to do what you think they are doing - `The idea being I can add more items to the list, or increase/decrease individual weights on the fly without having to change any other entries.` What would happen if you added four more 4s? Does that mean that there is a 20% chance for each 4 and then 5% for each 1? – Nicole Jun 23 '11 at 18:38
  • @Renesis: Four items weighted 1, and five items weighted 4 would be total weight of 24, so the (W1) items would have a 1/24 chance of being selected*, whilst the (W4) items would have a 1/6 chance of being selected*. *Selected the first time. – aslum Jun 23 '11 at 18:45
  • @aslum Ok, got it. I actually meant to say *three* more 4s, but it seems we are on the same page as far as the meaning of the weights. – Nicole Jun 23 '11 at 19:10

4 Answers4

7

In your comment you say that unique means:

I don't want to pick the same item twice.

.. and that the weights determine a likelihood of being picked.

All you need to do to make sure that you don't pick duplicates, is simply remove the last picked item from the list before you pick the next one. Yes, this will change your weights slightly, but that is the correct statistical change to make if you do want unique results.


In addition, I'm not sure how you are using the weights to determine the candidates, but I came up with this algorithm that should do this with a minimal number of loops (and without the need to fill an array according to weights, which could result in extremely large arrays, requires int weights, etc.)

I've used JavaScript here, simply so it's easy to see the output in a browser without a server. It should be trivial to port to PHP since it's not doing anything complicated.

Constants

var FRUITS = [
    {name : "Apple", weight: 8 },
    {name : "Orange", weight: 4 },
    {name : "Banana", weight: 4 },
    {name : "Nectarine", weight: 3 },
    {name : "Kiwi", weight: 1 }
];

var PICKS = 3;

function getNewFruitsAvailable(fruits, removeFruit) {
    var newFruits = [];
    for (var idx in fruits) {
        if (fruits[idx].name != removeFruit) {
            newFruits.push(fruits[idx]);
        }
    }
    return newFruits;
}

Script

var results = [];
var candidateFruits = FRUITS;

for (var i=0; i < PICKS; i++) {
    // CALCULATE TOTAL WEIGHT OF AVAILABLE FRUITS
    var totalweight = 0;
    for (var idx in candidateFruits) {
        totalweight += candidateFruits[idx].weight;
    }
    console.log("Total weight: " + totalweight);

    var rand = Math.random();

    console.log("Random: " + rand);

    // ITERATE THROUGH FRUITS AND PICK THE ONE THAT MATCHES THE RANDOM
    var weightinc = 0;
    for (idx in candidateFruits) {
        // INCREMENT THE WEIGHT BY THE NEXT FRUIT'S WEIGHT
        var candidate = candidateFruits[idx];
        weightinc += candidate.weight;

        // IF rand IS BETWEEN LAST WEIGHT AND NEXT WEIGHT, PICK THIS FRUIT
        if (rand < weightinc/totalweight) {
            results.push(candidate.name);
            console.log("Pick: " + candidate.name);

            // GET NEXT SET OF FRUITS (REMOVING PICKED FRUIT)
            candidateFruits = getNewFruitsAvailable(candidateFruits, candidate.name);
            break;
        }
    }
    console.log("CandidateFruits: " + candidateFruits.length);
};

Output

for (var i=0; i < results.length; i++) {
    document.write(results[i] + "<br/>");
}

The basic strategy is to allocate each fruit a portion of the total range [0,1). In the first loop, you'd have this:

  • Apple — 8/20 = 0.0 up to 0.4
  • Orange — 4/20 = 0.4 up to 0.6
  • Banana — 4/20 = 0.6 up to 0.8
  • Nectarine — 3/20 = 0.8 up to 0.95
  • Kiwi — 8/20 = 0.95 up to 1.0

The script iterates over each item in the list, and progresses a weight counter. When it reaches the range that contains the first random, it picks that item, removes it from the list, then recalculates the ranges based on the new total weight and runs again.

Nicole
  • 32,841
  • 11
  • 75
  • 101
  • Well Frack. That is so obvious. Dur dur dur. I wear my shoes on my nose. – aslum Jun 23 '11 at 18:49
  • @aslum: See my answer as well, came up with a rand function that *i think* will do what you're looking for. – Brad Christie Jun 23 '11 at 18:57
  • @aslum - I've updated my answer with a rand script for this (using Javascript). Since you have the uniqueness requirement, I think it's important to avoid filling an array with duplicates just for the weights (since then you have to remove all those items, which could be costly). – Nicole Jun 23 '11 at 19:08
  • @Renesis: I contemplated uses a "Better suited" algorithm, but was pulled away as I envision pulling from the explicit array means (almost definitively) you're pulling from top-weight down. That is to say, though (in OPs example) pineapple is weighed heavily, I'd like to not always see it as the first selected just because it's weight is so high. Instead, I'd like to see it still retain "randomness". (Maybe I'm wrong on my conclusion). – Brad Christie Jun 23 '11 at 19:49
  • @Brad It does *seem* like there is some unfairness, but since the random is equally likely to appear anywhere in the range, it is still 100% random according to the proportional weight, which was the stated desired effect. That is to say, if the pineapple weight is 50% of the total weight, the exact likelihood of being first selected is 50%. – Nicole Jun 23 '11 at 20:26
  • @Brad, in addition, if you don't remove each item as you pick it (and re-roll instead) all you are doing is making it harder (take longer) to find the next match - not changing the statistical likelihood. – Nicole Jun 23 '11 at 20:28
2

Here I found the idea to following steps:

  1. Build the sum of the weights --> SUM
  2. Build a random number between 0 and SUM --> RAND_NUMBER
  3. Iterate through the list and subtract each element weight from RAND_NUMBER. If RAND_NUMBER gets negative, you have your first element.
  4. Remove the found element from the list and go back to step 1 until you have 4 elements.
Community
  • 1
  • 1
Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
1

Update

function array_rand2($ary,$n = 1)
{
  // make sure we don't get in to an infinite loop
  // check we have enough options to select from
  $unique = count(array_unique(array_keys($ary)));
  if ($n > $unique) $n = count($unique);

  // First, explode the array and expand out all the weights
  // this means something with a weight of 5 will appear in
  // in the array 5 times
  $_ary = array();
  foreach ($ary as $item => $weight)
  {
    $_ary = array_merge($_ary, array_fill(0, $weight, $item));
  }

  // now look for $n unique entries
  $matches = array();
  while (count($matches) < $n)
  {
    $r = $_ary[array_rand($_ary)];
    if (!in_array($r,$matches))
    {
      $matches[] = $r;
    }
  }

  // and now grab those $n entries and return them
  $result = array();
  foreach ($matches as $match){
    $result[] = $match;
  }
  return $result;
}

See if that does a better job.

Brad Christie
  • 100,477
  • 16
  • 156
  • 200
0

Maybe instead of "rerolls" you could just increment the list element index you've randomly generated: list.elementAt(rand_index++ % size(list)) (something like that). I'd think you'd find the next random unique item pretty fast with logic like that.

I'm sure there are even better solutions, of course, there usually are.

Edit: Looks like Brad's already provided one.. :)

James T Snell
  • 1,588
  • 1
  • 15
  • 28