2

(This may not be possible short of writing a decently sized helper method, but regardless, I'd like to figure it out)

Say I have a list:

{1,3,6}

I'd like to get a random item from that list, BUT, I'd like it to be weighted directly (probability-wise) with the item's value. So if you ran it 100,000 times, the 1 would be selected about 10,000 times, the 3 would be selected about 30,000 times, and the 6, 60,000 times.

I may just write a helper method by creating ranges like this:

{1,3,6}

Generate random number between 1(inclusive) and 11(exclusive) (sum of list)

if (number == 0)
{
    //1
}
else if (number > 0 && number < 4)
{
    //3
}
else
{
    //6
}

Though that particular example is fairly simple, I'm often working with large lists and they are always different, so it'd be a bit more complex. While I could do it, I'm curious whether there's an easier way.

Wilson
  • 8,570
  • 20
  • 66
  • 101
  • 1
    Check out alias tables, described [here][1]. [1]: http://stackoverflow.com/questions/17250568/randomly-choosing-from-a-list-with-weighted-probabilities – pjs Jul 28 '13 at 20:25

4 Answers4

6

You've already got the basic idea - sum the weights (which happen to be the same as your values here) and take a random number in that range - although I'd use 0 as the lower bound, and the sum as the exclusive upper bound. Then you just need to work through the list to find out which value that corresponds to... begin at the start of the list, and keep checking whether the random number is less than the current item's weight: if it is, that's the selected item. If it isn't, subtract the weight from the random number, and move on.

That's an O(N) algorithm, admittedly. If you need to take a random number from the same list multiple times, you could build an accumulated weight list, and then do a binary search to find out which index corresponds with which random number... but I'd stick to the relatively simple approach first.

I haven't tested it, but it would be something like:

// Note: this will iterate over the sequence twice. It's expected not to change
// between iterations!
// The Random parameter is so that you can use a single instance multiple times.
// See http://csharpindepth.com/Articles/Chapter12/Random.aspx
int PickRandomWeightedElement(IEnumerable<int> sequence, Random random)
{
    int totalWeight = sequence.Sum();
    int weightedPick = random.Next(totalWeight);
    foreach (var item in sequence)
    {
        if (weightedPick < item)
        {
            return item;
        }
        weightedPick -= item;
    }
    throw new InvalidOperationException("List must have changed...");
}

If you need to separate the item from the weight, you could either take two parameters (one for the weight, one for the items) or a parameter of type IEnumerable<Tuple<T, int>> where each tuple is an item/weight pair.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Awesome answer, thanks. I didn't think to just increment down with weightedPick - that really simplifies it. When you say it iterates twice, is that once for `Sum()` and once for the actual `foreach` loop? Just curious, the sequences won't be large enough for that to be a performance concern. – Wilson Jul 28 '13 at 19:46
  • @Wilson: Exactly - the iteration in `Sum` is implicit. If the sequences are short, that shouldn't be a problem. – Jon Skeet Jul 28 '13 at 19:53
  • I don't think this would work right when using the number `0`. `0` Should have `0` chance of being chosen, but it could be chosen since the range of `Random.Next()` is `0` and up. – John Jul 28 '13 at 19:54
  • 1
    @John: How would `weightedPick` ever by *less* than `item`, which is the only way in which it could be returned? – Jon Skeet Jul 28 '13 at 19:56
  • @JonSkeet If the numbers are in reverse order `1, 0` and the weighted pick is `0`. The number returned is `0` when `0` should have never had a chance. – John Jul 28 '13 at 20:01
  • @JonSkeet nevermind, I took it out of its own function and set it to set the value and continue which allowed it to loop once more set the zero. – John Jul 28 '13 at 20:04
2

I would let statistics and probability run its course by adding the same elements multiple times. That way you will skew the statistics. Over time you will have the distribution you're looking for

{1,3,3,3,6,6,6,6,6,6}
TGH
  • 38,769
  • 12
  • 102
  • 135
  • That's definitely a nice option for small numbers. It would be rather inefficient if the values happened to be in the millions though. – Jon Skeet Jul 28 '13 at 19:41
1

And one more try on it :)

public static object GetRandom(this IList list, List<int> weights){
    var sum = weights.Sum();
    var r = new Random().Next(1,sum);
    var w = 0;
    var i = -1;
    while(w <= r){
        i++;
        w+=weights[i];
    }
    return list[i];
}

If you want to optimize speen and randomness you can pre-calculate weights sum and re-use Random instance and pass them as parameters. Or even format list of cumulative sums from list of weights to get rid of arithmetic operation in the cycle.

dmay
  • 1,340
  • 8
  • 23
  • 1
    You don't want to create a new Random instance on each call. See http://csharpindepth.com/Articles/Chapter12/Random.aspx. – Jon Skeet Jul 28 '13 at 19:55
0

Here is a generic algorithm for russian roulette.

private static Random random = new Random();
public static T GetRandomItem<T>(Dictionary<T, int> items)
{
    int sum = items.Values.Sum();
    int cumulatedProbability = random.Next(sum);

    foreach(var item in items)
        if((cumulatedProbability -= item.Value) < 0)
            return item.Key;

    throw new InvalidOperationException();
}

To use it:

Dictionary<string, int> items = new Dictionary<string, int> { { "Item 1", 10000 }, { "Item 2", 30000 }, { "Item 3", 60000 } };

var randomItem = GetRandomItem(items);
Cédric Bignon
  • 12,892
  • 3
  • 39
  • 51
  • I wouldn't use a `Dictionary` here - I'd use an `IEnumerable>` or something similar. `Dictionary` gives the impression that you want to look values up by the key - and it stops you from having the same key multiple times. – Jon Skeet Jul 28 '13 at 19:40
  • @JonSkeet Why? When writing the code I hesitated between `IEnumerable>` and `Dictionary` but I came to the conclusion that a same key shouldn't appear multiple times (otherwise the duplicated key has a probabilty to appear equal to the sum of probabilities). – Cédric Bignon Jul 28 '13 at 19:42
  • The duplicated key *should* have a probability to appear equal to the sum of the probabilities, in my view. And it isn't really a "key" - it's just an item. I think it's important to use types which give the right implication about your data - neither `Dictionary` nor `IEnumerable>` do that IMO. – Jon Skeet Jul 28 '13 at 19:54