11

Lets assume:

List<element> which element is:

public class Element {
   int Weight { get; set; }
}

What I want to achieve is, select an element randomly by the weight. For example:

Element_1.Weight = 100;
Element_2.Weight = 50;
Element_3.Weight = 200;

So

  • the chance Element_1 got selected is 100/(100+50+200)=28.57%
  • the chance Element_2 got selected is 50/(100+50+200)=14.29%
  • the chance Element_3 got selected is 200/(100+50+200)=57.14%

I know I can create a loop, calculate total, etc...

What I want to learn is, whats the best way to do this by Linq in ONE line (or as short as possible), thanks.

UPDATE

I found my answer below. First thing I learn is: Linq is NOT magic, it's slower then well-designed loop.

So my question becomes find a random element by weight, (without as short as possible stuff :)

dimitar.bogdanov
  • 387
  • 2
  • 10
Eric Yin
  • 8,737
  • 19
  • 77
  • 118
  • 2
    So you want short code, but you don't care about it being slow? – CodesInChaos Feb 04 '12 at 14:35
  • no no, I know loop is slow, so I want to use Linq, should be faster. Not very short, just not as complex as use loop and loop again. I can imaging what I can do is: 1: get total, 2: random from total 3: get item in the range.. quite complex and slow, I guess – Eric Yin Feb 04 '12 at 14:36
  • 2
    Linq will be slower that loop based code. If you want fast code, you need to precompute in `O(n)` so you can a `O(1)` lookup. But the code for that will be relatively complex. – CodesInChaos Feb 04 '12 at 14:38
  • Where did you see "Linq will be slower that loop based code"? I thought linq is faster than that, and I use plenty of linq in my code :( – Eric Yin Feb 04 '12 at 14:39
  • 4
    How do you think Linq(to objects) works? Magic? It just encapsulates the loop, and is typically slower by a factor of about 2-3 than hand written loops. The main advantage of linq is shorter clearer code. – CodesInChaos Feb 04 '12 at 14:40
  • 1
    LINQ is indeed slower than writing an efficent iterative algorithm yourself. The reason for linq is that it is much easier to read/use and far less likely to result in a bug. – roken Feb 04 '12 at 14:42
  • #CodeInChaos I don't know. I know loop is the worst and stupid way. So linq might be better, or at least it can do some short circuits in the loop. Anyway, I don't think my skill on C# is good, so I would rather use linq which coded by professionals. – Eric Yin Feb 04 '12 at 14:42
  • 5
    @CodeInChaos - 2 to 3 times slower is a gross exaggeration. – roken Feb 04 '12 at 14:43
  • 1
    Does your array change between different runs? It is relatively simple to get `O(n)` precomputation with `O(log(n))` lookup. And how many elements are there in the collection? If it's only a handful, a linear search will still be fast. – CodesInChaos Feb 04 '12 at 14:44
  • 1
    @roken If the content of your delegates/lambdas is cheap, that factor should be correct. If the content of the lambdas is expensive, the difference obviously gets smaller. – CodesInChaos Feb 04 '12 at 14:45
  • @CodeInChaos Yes, it changes. It's a pick one from a `List` by weight function, `List` changes all the time. I think I will just use Linq to get total, then do random my self. – Eric Yin Feb 04 '12 at 14:47
  • @CodeInChaos "If the content of the lambdas is expensive" how you decide? `String` is cheap? `Int` is cheap, `object/class` is expensive? is it? – Eric Yin Feb 04 '12 at 14:48
  • 1
    "cheap" means "code which is executed in a short time". File operations and remote calculations, as well as complex algorithms are expensive, meaning "taking a lot of time". – Nuffin Feb 06 '12 at 13:07

5 Answers5

6

If you want a generic version (useful for using with a (singleton) randomize helper, consider whether you need a constant seed or not)

usage:

randomizer.GetRandomItem(items, x => x.Weight)

code:

public T GetRandomItem<T>(IEnumerable<T> itemsEnumerable, Func<T, int> weightKey)
{
    var items = itemsEnumerable.ToList();

    var totalWeight = items.Sum(x => weightKey(x));
    var randomWeightedIndex = _random.Next(totalWeight);
    var itemWeightedIndex = 0;
    foreach(var item in items)
    {
        itemWeightedIndex += weightKey(item);
        if(randomWeightedIndex < itemWeightedIndex)
            return item;
    }
    throw new ArgumentException("Collection count and weights must be greater than 0");
}
doblak
  • 3,036
  • 1
  • 27
  • 22
  • I love this solution, looks very very decent. In addition, I will edit and allow `totalWeight=0`, then just random pick one (with equal chance since them weights are same) without throw an exception. – Eric Yin Feb 04 '12 at 15:42
  • I think here I found an issue (a little issue), you should use `randomWeightedIndex = _random.Next(totalWeight)+1;`, then compare `if(randomWeightedIndex <= itemWeightedIndex)`. Otherwise the chance of been selected is slightly different of my requirement in my question. – Eric Yin Feb 04 '12 at 15:46
  • 2
    I am rather sure it doesn't make a difference. – doblak Feb 04 '12 at 15:52
  • 1
    It makes. Lets assume `A=0, B=2`, the chance about A been selected is 0%, and B is 100%. `R=rand.Next(2)` will return 0-1. When R=1, A will be selected, Which becomes 50% for A and B. If use +1, R=1 to 2, neither will <=0 – Eric Yin Feb 04 '12 at 15:59
  • because the weight-key, in common sense, should >=0, so I use byte, from 0-255 are legal weight-key – Eric Yin Feb 04 '12 at 16:00
  • 1
    At zero based indexes it makes a difference, but I consider weight value 0 as "0% chance", so ... it all depends on the definition. However it's nice to see you find it useful. – doblak Feb 04 '12 at 20:41
5
// assuming rnd is an already instantiated instance of the Random class
var max = list.Sum(y => y.Weight);
var rand = rnd.Next(max);
var res = list
    .FirstOrDefault(x => rand >= (max -= x.Weight));
Nuffin
  • 3,882
  • 18
  • 34
  • 2
    Factor out `list.Max(...)`. Your current code is `O(n^2)`. If you factor it out, it's only `O(n*log(n))` – CodesInChaos Feb 04 '12 at 14:48
  • Hi @Tobias, thanks for the code. Is it random a number from max weight of all elements, then sort by random? If sort by random, then why you need `where`? And also, why generate a random number from max weight of all elements? – Eric Yin Feb 04 '12 at 14:53
  • 1
    If you need First(), and you already use Random, why would you order by Random? Also, I'd cache list.Max, no reason to calculate the max on each iteration. – Abel Feb 04 '12 at 14:54
  • 1
    Anyway, to @allguys, I think I just use loop :( – Eric Yin Feb 04 '12 at 14:54
  • and.. why all you guys talk about `Max`, I am not using `max` as root for divide the percentage, I use `total` – Eric Yin Feb 04 '12 at 14:56
  • 1
    @EricYin: he uses Random on max weight, because you said that 200 should have a higher chance of being selected than 50. However, I'm uncertain whether it matches the sum(weight)-weighing you put in your question. The ordering seems indeed redundant (see my earlier comment). – Abel Feb 04 '12 at 14:57
  • 2
    The `minWeight` is to ensure the items are chosen more often as their weight increases. And the random order (@Abel) is for selecting _a random item_ from the result set, as just selecting the first would never yield the `Weight=50` item from the example, even if `minWeight` would be `< 50` – Nuffin Feb 04 '12 at 15:02
  • 1
    I spoke too fast, Tobias is of course right, you need to shuffle the deck. – Abel Feb 04 '12 at 15:08
  • 1
    This algorithm is incorrect. Consider the case where you have 2 items with weights 1 and 2. This algorithm will select them with equal probabilities, while it should be 1/3 and 2/3. Even if you fix it to say `x.Weight > minWeight` instead of `x.Weight >= minWeight`, the probilities will be 1/4 and 3/4. – interjay Feb 05 '12 at 10:28
  • On the right track, but if you have 10 items with equal weights, won't this choose the first one with 90% probability? I think you need to start a counter at 0 and increase by `x.Weight` each time instead of comparing with `max`. Also, change `<=` to `<`. – interjay Feb 06 '12 at 14:42
  • @interjay: I start at `max` and decrease by `x.Weight` (with the "subtract and assign" operator `-=`), and mixed up `<` and `>`. Now it should produce exactly the same probabilities as the accepted answer. – Nuffin Feb 06 '12 at 22:28
  • It will be correct if you change `>` to `>=`. Preemptive +1 from me. – interjay Feb 07 '12 at 10:20
3

This is a fast solution with precomputation. The precomputation takes O(n), the search O(log(n)).

Precompute:

int[] lookup=new int[elements.Length];
lookup[0]=elements[0].Weight-1;
for(int i=1;i<lookup.Length;i++)
{
  lookup[i]=lookup[i-1]+elements[i].Weight;
}

To generate:

int total=lookup[lookup.Length-1];
int chosen=random.GetNext(total);
int index=Array.BinarySearch(lookup,chosen);
if(index<0)
  index=~index;
return elements[index];

But if the list changes between each search, you can instead use a simple O(n) linear search:

int total=elements.Sum(e=>e.Weight);
int chosen=random.GetNext(total);
int runningSum=0;
foreach(var element in elements)
{
  runningSum+=element.Weight;
  if(chosen<runningSum)
    return element;
}
CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • 1
    Where's the LINQ from the question? ;) – Abel Feb 04 '12 at 15:11
  • 1
    `int total=elements.Sum(e=>e.Weight)` :P That's the only place where I found linq useful in this problem. I couldn't find a fast and clean way to perform the search itself with linq. – CodesInChaos Feb 04 '12 at 15:14
2

This could work:

int weightsSum = list.Sum(element => element.Weight);
int start = 1;
var partitions = list.Select(element => 
                 { 
                   var oldStart = start;
                   start += element.Weight;
                   return new { Element = element, End = oldStart + element.Weight - 1};
                 });

var randomWeight = random.Next(weightsSum);
var randomElement = partitions.First(partition => (partition.End > randomWeight)).
                               Select(partition => partition.Element);

Basically, for each element a partition is created with an end weight. In your example, Element1 would associated to (1-->100), Element2 associated to (101-->151) and so on...

Then a random weight sum is calculated and we look for the range which is associated to it.

You could also compute the sum in the method group but that would introduce another side effect...

Note that I'm not saying this is elegant or fast. But it does use linq (not in one line...)

vc 74
  • 37,131
  • 7
  • 73
  • 89
  • thanks. I think I will leave linq except the sum part, since it creates too much new object on the way – Eric Yin Feb 04 '12 at 15:44
0

.Net 6 introduced .MaxBy making this much easier.
This could now be simplified to the following one-liner:

list.MaxBy(x => rng.GetNext(x.weight));

This works best if the weights are large or floating point numbers, otherwise there will be collisions, which can be prevented by multiplying the weight by some factor.