3

Given a list of integers, e.g. 1, 2, 3, 4, I know how to select items based on their weight. The example items would have probabilities of 10%, 20%, 30%, and 40%, respectively.

Is there an equally simple method of selecting items based on the inverse of their weight? With this method, the example list would be equal to a weighted list of 1, 1/2, 1/3, 1/4 (48%, 24%, 16%, 12%), but I want to avoid the conversion and use of floating-point arithmetic. (Assume all of the integers are positive and non-zero.)

dlras2
  • 8,416
  • 7
  • 51
  • 90

2 Answers2

3

You could divide the numbers' least common multiple by each number and get integral proportions.

For [1, 2, 3, 4], this is 12. Your weights are 12/1=12, 12/2=6, 12/3=4, 12/4=3.

You could also multiply them all together and not bother with the LCM as well. The numbers will be higher but the proportions will be the same: 24/1=24, 24/2=12, 24/3=8, 24/4=6.

Platinum Azure
  • 45,269
  • 12
  • 110
  • 134
  • I *just* came up with this solution. All it took was writing it down again and thinking for a few minutes. – dlras2 Oct 14 '11 at 02:59
  • Don't worry I will. I've gotta wait 4 more minutes tho. =P – dlras2 Oct 14 '11 at 03:01
  • The problem with this method is that for large `n`, you will run into integer overflow issues. Even for a 64-bit long, this will fail for `n` on the order of 25. – tskuzzy Oct 14 '11 at 03:02
  • Well, the assumption is that for bigger numbers, one would use data types which can avoid that. Most languages have bignum support at least in a library. – Platinum Azure Oct 14 '11 at 03:06
0

First get the sum of the weights, call it S (e.g. 1 + 1/2 + 1/3 + 1/4 = 2.083). Then to find the probability of weight w_i, you divide w_i by S (e.g. 1/2.083 = 48%.

I don't think there's a nice, closed-form formula for this expression for general sequences of numbers.

The sum of the weights are harmonic numbers. For large n, the sum converges to ln(n)+gamma where gamma is the Euler–Mascheroni constant (~0.577). So for large n, you could use this formula to approximate the sum.

EDIT: There are ways to reduce floating point errors. One such way is to calculate the sum from the smallest term up to the largest term (e.g. 1/n + 1/(n-1) + ... + 1). This allows the intermediate calculations to maximize the number of bits of precision. By doing this, rounding issues should not be a problem.

tskuzzy
  • 35,812
  • 14
  • 73
  • 140