Pick a random weighted element, with sample, no replacement

Question

Given a structure representing a reward in a loot table, where a is the reward type, and 2 is an integer weighting that means that a is twice as likely to get pulled out then d.

Map{
  "a" -> 2
  "b" -> 2
  "c" -> 2
  "d" -> 1
  "e" -> 1
  "f" -> 1
}

How can I generate a sample for display purposes + a winner?

My current (pseudo) code:

list out;
foreach(entry:map){
  for(entry.value){
    out.add(a)
  }
}

Then to create a sample for display.

Collections.shuffle(out);
List display = out.stream()
  .distinct()
  .limit(8)
  .collect(Collectors.toList());

With this code, can I trust .distinct to not skew the odds, if I choose the winner by

winner = display.get(0);

I realize that getting the last element added will possibly skew the results, as after the distinct call happens, it will make it more likely to pick a number with a lower weighting.

but picking the first element of the stream should be trust worthy right? since it was chosen before .distinct had it's state inducing effect?

Martin Milichovsky · Accepted Answer · 2016-09-19T21:15:17.923

Look at Stochastic universal sampling and Fitness proportionate selection. The simple approach for taking one sample according to the weights can be explained by representing each element as an interval on with length proportionate to its weight. E.g:

Map{
  "a" -> 2 // weight 2
  "b" -> 2
  "c" -> 2
  "d" -> 1
  "e" -> 1
  "f" -> 1
}
=>
Map{
  "a" -> (0,2) // weight 2 -- is now length of the interval
  "b" -> (2,4) // ...
  "c" -> (4,6)
  "d" -> (6,7)
  "e" -> (7,8)
  "f" -> (8,9)
}

Then you pick random number from 0 to 9 9*Math.random() (as a pointer to the range) and check which interval it belongs to -- this is your random sample w.r.t the input weights. Repeat until you get the desired number of samples (and ignore duplicates, if you wish)...

Of course this is a bit idiomatic explanation, in the real code you would keep just the upper bound, since the lower is just the upperof previous element. And then you'd pick the first element that has the bound above the random pointer.

Update: Your original approach with repeating the elements is OK from the mathematical point of view (probability of picking elament with double weight is double), but it would be an issue when the weights are high: Map{"a"->1000 "b"->100000}. Also it wouldn't handle real-valued weights well.

Thanks for confirming that the math is right, I don't expect the tables to get very very large. My main problem is whether I can trust Java 8 Streams to emit the first picked response or not. I am tempted to convert over to your method just to provide some more flexibility with the chances. — Ryan Leach, Sep 20 '16 at 04:01

score 1 · Answer 2 · answered Sep 19 '16 at 21:33

I like Martin's answer, but I'll also post my own as a caveat/alternative based on the performance concerns he raised. A very similar implementation to his own can be achieved with a Map (I'll use HashMap since it's my favorite).

private final AtomicLong idxCounter = new AtomicLong(0);
private final Map<Long, Item> dropTable = new HashMap<>();
public void addDrop(Item item, long relativeFrequency) {
    while (relativeFrequency-- > 0) {
        Long nextIdx = idxCounter.getAndIncrement();
        dropTable.put(nextIdx, item);
    }
}

private static final Random rng = new Random(System.currentTimeMillis());
public Item getRandomDrop() {
    Long size = idxCounter.get();
    // randomValue will be something in the interval [0, size), which 
    // should cover the whole dropTable.
    // See http://stackoverflow.com/questions/2546078 for a fair
    // implementation of nextLong.
    Long randomValue = nextLong(rng, size); 
    return dropTable.get(randomValue); 
}

Getting a value by key from a HashMap is very fast. You could optimize it further by specifying the dropTable initial capacity and load factor (see the javadoc for HashMap) but that's up to your own judgement.

It's also thread-safe so long as nothing else is toying with the dropTable!

This suffers the same problem as the OPs solution: when `relativeFrequency` is huge, you will have to hold in memory huge Map. It also doesn't support real-valued weights. Better would be to keep a list of consecutive `[low, high)` bounds; there you can find the target sample using binary search. — Martin Milichovsky, Sep 19 '16 at 21:43

score 0 · Answer 3 · answered Sep 19 '16 at 20:24

Your data structure implementation seems a little bit odd. I would do something like this:

Map{
  0 -> "a"
  2 -> "b"
  4 -> "c"
  5 -> "d"
  6 -> "e"
  7 -> "f"
}

Then, to make things faster (or to allow for a very large loot table), I'd have a value like int maxValue = 7. Now, to get a loot item from the table, I can just call for a random integer lootDrop between 0 and maxValue (inclusive). Then I can iterate through my table to find the largest value less than or equal to lootdrop. If you needed to keep the map as string to integer mappings, and have control over the integer mappings, it's fairly trivial to do that as well.

If you don't want to go that far, you could simply get a random integer between 0 and 8 in your solution, which will still work.

Is there a reason you're insisting on this formulation?

The main reason is that it makes it easier for user configuration. What's posted in the question is also a simplification of the data structure, instead of a map, it's actually a Map of that have several properties that are defined directly in a configuration file. Where Reward.weight is one of the properties. — Ryan Leach, Sep 19 '16 at 20:52

Pick a random weighted element, with sample, no replacement

3 Answers3