6

Update: my problem has been solved, I updated the code source in my question to match with Jason's answer. Note that rikitikitik answer is solving the issue of picking cards from a sample with replacement.

I want to select x random elements from a weighted list. The sampling is without replacement. I found this answer: https://stackoverflow.com/a/2149533/57369 with an implementation in Python. I implemented it in C# and tested it. But the results (as described below) were not matching what I expected. I've no knowledge of Python so I'm quite sure I made a mistake while porting the code to C# but I can't see where as the code in Pythong was really well documented.

I picked one card 10000 times and this is the results I obtained (the result is consistent accross executions):

Card 1: 18.25 % (10.00 % expected)
Card 2: 26.85 % (30.00 % expected)
Card 3: 46.22 % (50.00 % expected)
Card 4: 8.68 % (10.00 % expected)

As you can see Card 1 and Card 4 have both a weigth of 1 but Card 1 is awlays picked way more often than card 4 (even if I pick 2 or 3 cards).

Test data:

var cards = new List<Card>
{
    new Card { Id = 1, AttributionRate = 1 }, // 10 %
    new Card { Id = 2, AttributionRate = 3 }, // 30 %
    new Card { Id = 3, AttributionRate = 5 }, // 50 %
    new Card { Id = 4, AttributionRate = 1 }, // 10 %
};

Here is my implementation in C#

public class CardAttributor : ICardsAttributor
{
    private static Random random = new Random();

    private List<Node> GenerateHeap(List<Card> cards)
    {
        List<Node> nodes = new List<Node>();
        nodes.Add(null);

        foreach (Card card in cards)
        {
            nodes.Add(new Node(card.AttributionRate, card, card.AttributionRate));
        }

        for (int i = nodes.Count - 1; i > 1; i--)
        {
            nodes[i>>1].TotalWeight += nodes[i].TotalWeight;
        }

        return nodes;
    }

    private Card PopFromHeap(List<Node> heap)
    {
        Card card = null;

        int gas = random.Next(heap[1].TotalWeight);
        int i = 1;

        while (gas >= heap[i].Weight)
        {
            gas -= heap[i].Weight;
            i <<= 1;

            if (gas >= heap[i].TotalWeight)
            {
                gas -= heap[i].TotalWeight;
                i += 1;
            }
        }

        int weight = heap[i].Weight;
        card = heap[i].Value;

        heap[i].Weight = 0;

        while (i > 0)
        {
            heap[i].TotalWeight -= weight;
            i >>= 1;
        }

        return card;
    }

    public List<Card> PickMultipleCards(List<Card> cards, int cardsToPickCount)
    {
        List<Card> pickedCards = new List<Card>();

        List<Node> heap = GenerateHeap(cards);

        for (int i = 0; i < cardsToPickCount; i++)
        {
            pickedCards.Add(PopFromHeap(heap));
        }

        return pickedCards;
    }
}

class Node
{
    public int Weight { get; set; }
    public Card Value { get; set; }
    public int TotalWeight { get; set; }

    public Node(int weight, Card value, int totalWeight)
    {
        Weight = weight;
        Value = value;
        TotalWeight = totalWeight;
    }
}

public class Card
{
    public int Id { get; set; }
    public int AttributionRate { get; set; }
}
Community
  • 1
  • 1
Gabriel
  • 1,572
  • 1
  • 17
  • 26
  • woho, i would do that with linq, `order by Guid.NewGuid()` and double/triple/... the amount of instances according to rate. easier to implement and easier to read - no word about performance though. –  Aug 02 '12 at 10:54
  • System.Random is NOT a good random number generator (and Guids aren't random generators at all). If you need a true random distribution you have to use something else. No choice. – Adriano Repetti Aug 02 '12 at 11:18
  • Note: even for a "perfect" RNG you won't get the same number of hits for both cards (even if they have the same weight)... – Adriano Repetti Aug 02 '12 at 11:24
  • If you want to extract cards EXACTLY in that proportion then simply generate them according with that and then MIX them. – Adriano Repetti Aug 02 '12 at 11:26
  • 7
    System.Random is a perfectly good random number generator for this purpose. Of course it is only a pseudorandom number generator, but that is not an issue in this case. – Ruud Aug 02 '12 at 11:33
  • Yes this solution (http://stackoverflow.com/questions/56692/random-weighted-choice) works perfectly when you select only one card (Card 1: 24.96 % (25.00 % expected), Card 2: 24.85 % (25.00 % expected), Card 3: 25.05 % (25.00 % expected), Card 4: 25.14 % (25.00 % expected)). The problem is not with the Random which is good enough in this case. – Gabriel Aug 02 '12 at 11:39
  • @RuudvA and Gabriel: a random number generator cannot be checked for a SMALL amount of samples but with a LARGE distribution. A weighted random number generator is as good as the underlying generator. System.Random is enough for many purposes but it's not GOOD in a statistical sense. – Adriano Repetti Aug 02 '12 at 11:42
  • 2
    @Adriano did you read my previous comment? With another algorithm I was able to get the expected distribution when picking a single card 10 000 times. The pseudo random generator of .NET is NOT the issue here. – Gabriel Aug 03 '12 at 01:58
  • @Gabriel I guess we're speaking about **something different**! Naive way: pick a random integer number [0..100), assign first 10 slots to card 1, second 30 slots to card 2 (and so on). You'll get a naive "weighted" random number generator. What's **wrong** is the distribution and predictability but... **I agree**, in this case System.Random will fit his needings!!! – Adriano Repetti Aug 03 '12 at 19:53
  • When you say "with no replacement", do you mean that when you pick the second card, the first card has been removed and the probabilities should be updated accordingly? In that case, your attribution rate should really be called a card count. – Mathias Aug 05 '12 at 21:21
  • @Mathias Yes once a card has been picked it can't be picked again. I agree that the naming could be better, I'll refactor once I get it working :) – Gabriel Aug 05 '12 at 23:48

4 Answers4

2

As some people have mentioned in the comments, create a list of the cards in the exact proportion that you want:

var deck = new List<Card>();

cards.ForEach(c => 
{
    for(int i = 0; i < c.AttributionRate; i++)
    {
         deck.Add(c);
    }
}

Shuffle:

deck = deck.OrderBy(c => Guid.NewGuid()).ToList();

And pick x cards:

var hand = deck.Take(x)

Of course this only works if AttributionRate is an int. Otherwise, you would have to tinker with the deck generation a bit.

I get the following results for 10,000 runs taking 5 at a time:

Card 1: 9.932% 
Card 2: 30.15% 
Card 3: 49.854% 
Card 4: 10.064% 

Another result:

Card 1: 10.024%
Card 2: 30.034%
Card 3: 50.034% 
Card 4: 9.908% 

EDIT:

I've braved the bitwise operations and I have taken a look at your code. After adding a generous amount of barbecue sauce on my fried brain, I noticed a few things:

First, Random.Next(min,max) will include min in the random pool, but not max. This is the reason for the higher than expected probability for Card 1.

After doing that change, I implemented your code and it appears to be working when you draw 1 card.

Card 1: 10.4%  
Card 2: 32.2% 
Card 3: 48.4% 
Card 4: 9.0% 

Card 1: 7.5%
Card 2: 28.1%
Card 3: 50.0% 
Card 4: 14.4% 

HOWEVER, your code will not work when you draw more than 1 card because of this statement:

heap[i].Weight = 0;

That line, and the recalculation loop after that, essentially removes all instances of the drawn card from the heap. If you happen to draw four cards, then the percentage becomes 25% for all cards since you're basically drawing all 4 cards. The algorithm, as it is, is not completely applicable to your case.

I suspect you would have to recreate the heap every time you draw a card, but I doubt it would still perform as well. If I were to work on this though, I would just generate 4 distinct random numbers from 1 to heap[1].TotalWeight and get the 4 corresponding cards from there, although the random number generation in this case might become unpredictable (rerolling) and thus inefficient.

rikitikitik
  • 2,414
  • 2
  • 26
  • 37
  • your code is working so I'm considering accepting this is as an answer. But it's 6 times slower than the code I posted and I'm quite sure the difference will even be bigger once I start to work with real data. – Gabriel Aug 06 '12 at 04:35
  • I didn't know performance is a consideration. The `Guid.NewGuid()` part _may_ be the culprit and you _may_ get a better result generating random decimals here instead. I'm not 100% sure of that, though. – rikitikitik Aug 06 '12 at 04:52
  • Yes it did, reduced the computing time by a bit more than 40%, still way slower than the original solution though. The answer I copied my code from was upvoted 28 times so I suppose it's working as advertised. I don't understand how my code can be so wrong. – Gabriel Aug 06 '12 at 05:01
  • 1
    I tried to see where your code went wrong, but bitwise operations fry my brain to a nice crisp ;) – rikitikitik Aug 06 '12 at 05:11
  • I'm afraid I worded my question poorly: once a card is picked you can't pick it again (this is what I meant by sample without replacement). Even though your original answer respected completely the weight it was also getting duplicate cards which doesn't match with my requirements. – Gabriel Aug 07 '12 at 04:23
2

There are two minor bugs in the program. First, the range of the random number should be exactly equal to the total weight of all the items:

int gas = random.Next(heap[1].TotalWeight);

Second, change both places where it says gas > to say gas >=.

(The original Python code is OK because gas is a floating-point number, so the difference between > and >= is negligible. That code was written to accept either integer or floating-point weights.)

Update: OK, you made the recommended changes in your code. I think that code is correct now!

Jason Orendorff
  • 42,793
  • 6
  • 62
  • 96
  • In fact I spoke too fast, it's working flawlessly when I pick one card only. As soon as I pick multiple cards (3 for example with the given set) I get the following result: Card 1: 18.30 % (10.00 % expected), Card 2: 30.20 % (30.00 % expected), Card 3: 32.25 % (50.00 % expected), Card 4: 19.25 % (10.00 % expected) – Gabriel Aug 07 '12 at 03:52
  • @Gabriel I don't think your expectations are right for picking multiple cards. In each trial you're picking 3 cards without replacement, right? So it is impossible for card 3 to account for 50% of the picks! – Jason Orendorff Aug 07 '12 at 23:50
  • 1
    When you pick multiple cards without replacement, the probabilities change as you go. Once you remove the first card, the probability of picking that card again goes to 0, and the probability of picking the remaining cards goes up. If you pick 3 of those 4 cards, without replacement, I would expect you to get Card 3 about 96.6% of the time. But since it is only one of the three cards you picked, it only makes up 32.2% of your total picks. Note that this is very close to what you observed! – Jason Orendorff Aug 08 '12 at 00:15
  • Thank you it makes complete sense. I tried with a few different samples and pick count and the result seem satisfying :) – Gabriel Aug 08 '12 at 04:04
1

If you want to pick x elements from a weighted set without replacement such that elements are chosen with a probability proportional to their weights, then your algorithm is wrong.

Consider the following weighted list:
'a': weight 1
'b': weight 2
'c': weight 3
and x = 2

In this example, your function should always return 'c' in the result set. This is the only way for 'c' to be chosen 3x as often as 'a' and 1.5x as often as 'b'. But it is trivial to see that your algorithm does not always yield 'c' in the result.

One algorithm which accomplishes this is to line up the items along the number line from 0 to 1 such that they occupy a segment sized proportionally to their weight, then randomly pick a number "start" between 0 and 1/x, then find all the points "start + n/x" (for all integers n such that the point is between 0 and 1) and yield the set containing the items marked by those points.

In other words, something like:

a.) optionally shuffle the list of elements (if you need random combinations of elements in addition to respecting the weights)  
b.) create a list of cumulative weights, if you will, called borders, such that borders[0] = items[0].weight and borders[i] = borders[i - 1] + items[i].weight  
c.) calculate the sum of all the weights => total_weight  
d.) step_size = total_weight / x  
e.) next_stop = pick a random number between [0, step_size)  
f.) current_item = 0  
g.) while next_stop < total_weight:
h.)   while borders[current_item] < next_stop:  
i.)     current_item += 1  
j.)   append items[current_item] to the output  
k.)   next_stop += step_size

Note: this only works where the biggest weight <= step_size. If one of the elements has a weight greater than the total weight / x, then this problem isn't possible: you'd have to pick an element more than once in order to respect the weights.

ech
  • 118
  • 2
  • 5
0

You could do this:

Card GetCard(List<Card> cards)
{
  int total = 0;
  foreach (Card c in cards)
  {
    total += AttributionRate;
  }

  int index = Random.Next(0, total - 1);
  foreach(Card c in cards)
  {
    index -= c.AttributionRate;
    if (index < 0)
    {
      return c;
    }
  }
}

Card PopCard(List<Card> cards)
{
  Card c = GetCard(cards);
  cards.Remove(c);
}

In theory this should work.

  • I didn't check his code but I guess the big big problem is not HOW you "extract" the card but the way you generate the pseudo-random number. Built-in generator is far from optimal. – Adriano Repetti Aug 02 '12 at 11:23
  • Here are the results I get with your solution: Card 1: 0.00 % (10.00 % expected), Card 2: 0.00 % (30.00 % expected), Card 3: 0.00 % (50.00 % expected), Card 4: 100.00 % (10.00 % expected). This problem is not as trivial as it seems, please refer to the link question (http://stackoverflow.com/a/2149533/57369) to gain more insight. – Gabriel Aug 03 '12 at 02:20