1

If I have a list such as this

  • White British, 85.67
  • White (other), 5.27
  • White Irish, 1.2
  • Mixed race, 1.2
  • Indian, 1.8
  • Pakistani, 1.3
  • Bangladeshi, 0.5
  • Other Asian (non-Chinese), 0.4
  • Black Caribbean, 1
  • Black African, 0.8
  • Black (others), 0.2
  • Chinese, 0.4
  • Other, 0.4

And I want to select 10,000 values from this list for example but I want to have the selected values match the weighting associated with them. So ~85% of the selected values should be 'White British'.

I've been attempting this with LINQ but have had no luck.

var items = from dataItem in listOfItems
where (dataItem.uses / listOfItems.Count) <= dataItem.weighting
select dataItem;

Where uses is how many times that value has been selected and listOfItems.Count is how many have been selected overall so far.

Thanks

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Eddie
  • 690
  • 10
  • 27
  • 1
    possible duplicate of [Random weighted choice](http://stackoverflow.com/questions/56692/random-weighted-choice) – Jon Skeet Aug 22 '11 at 09:36
  • Is `dataItem.uses` an integer? If that's the case then: `where (dataItem.uses / listOfItems.Count) <= dataItem.weighting` will be doing integer arithmetic so you wont be getting the results you expect. You will need to convert to a floating point type: `where ((double)(dataItem.uses) / (double)(listOfItems.Count)) <= dataItem.weighting`. Though it probably won't solve your underlying problem. – ChrisF Aug 22 '11 at 10:05

1 Answers1

0

I guess to try to create 10000 values from "White British", "White", ... and the resulting set should have a distribution near (better equal) to the percentages you have given.

Here is my try to the solution:


    struct Info
    {
        public string Name { get; set; }
        public float Percent { get; set; }
    }

    class Statistics
    {
        public IEnumerable&ltstring&gt CreateSampleSet(int sampleSize, params Info[] infos)
        {
            var rnd = new Random();
            var result = new List&ltstring&gt();
            infos = infos.OrderByDescending(x =&gt x.Percent).ToArray();
            foreach (var info in infos)
            {
                for(var _ = 0; _ &lt (int)(info.Percent/100.0*sampleSize); _++)
                result.Add(info.Name);
            }

            if (result.Count &lt sampleSize)
            {
                while (result.Count &lt sampleSize)
                {
                    var p = rnd.NextDouble()*100;
                    var value = infos.First(x =&gt x.Percent &lt= p);
                    result.Add(value.Name);
                }
            }

            return result;
        }
    }

this will simply use the given percentages to add the desiered amount (or better the floor-value of it) to the result and finaly adds random results till the desired samplesize is reached.

Note: the last random results will be added with respect to the given distribution

Random Dev
  • 51,810
  • 9
  • 92
  • 119
  • Wow that looks pretty much perfect. However what happens if I have 1000 pieces of info and only want a sample size of 10? – Eddie Aug 22 '11 at 10:00
  • then the code will fail as it is (see last comment) - in this case you have to add the last random elements according to the distribution - so sort by percentage desending (call it l), choose a value random value from 0 to 100 exclusive (and call it p) and then search the values (l) for last value <= p and add this value ... will change my code – Random Dev Aug 22 '11 at 10:03