2

Let's say I have a collection of n objects, and that each object has a ranking associated with it, and these rankings correspond to the integer values 1 through n.

Now suppose that I want to select an object at random from the collection. But I don't want to just pick a number from 1 to n at random; instead I want to make it so that I am more likely to pick a number higher up the list (with a ranking closer to 1).

Proposed solution: Instead of picking from 1 to n, pick from 1 to m, where m is some number significantly larger than n; then use some mapping function f : [1,m] → [1,n] that maps more numbers to the higher rankings than to the lower rankings. For instance, f(1), f(2), f(3) might all return 1, while f(m) is the only one which maps to n, so it is three times more likely to get 1 than n. Hopefully that makes sense.

So my question is: If this seems like a reasonable algorithm, what's a reasonable function f that accomplishes this, and what ratio m/n would be large enough that integer rounding doesn't prevent numbers from never getting picked?

[In my particular scenario, n can be pretty large (in the thousands), so solutions like the one presented here aren't very practical for this situation. Also, the selection is "with replacement"; i.e. I pick an object once and then return; I don't care if I pick it again immediately the next time.]

Community
  • 1
  • 1
Andrew
  • 4,953
  • 15
  • 40
  • 58

5 Answers5

2

You can do something like the following:

double bias = 1.5; 
int index = (int)(n * (bias - sqrt(bias*bias -4.0*(bias-1.0)* random()))
                  / 2.0 / (bias-1));

Changing the bias parameter lets you control how strongly you favor the higher ranked individuals.

Edit: Here's some python code for it.

def pick(n, bias):
    return int(n * (bias - sqrt(bias*bias -4.0*(bias-1.0)*random())) / 2.0 / (bias-1))

vals = [0]*10
for i in xrange(1000):
    vals[pick(10,1.5)] += 1
print vals
[153, 151, 115, 116, 97, 96, 87, 69, 66, 50]
deong
  • 3,820
  • 21
  • 18
  • 1
    Out of curiosity, how did you come up with this formula? I don't doubt it works, but it's not immediately obvious to me as to *why* it works. – Andrew May 03 '11 at 16:19
  • 1
    I didn't. My advisor in grad school had developed an algorithm that needed to do this years ago (1989 to be exact, and the algorithm is a GA called Genitor). I recognized the same requirements and pulled up the code to that algorithm. :) – deong May 03 '11 at 16:25
1

I'd try using the normal random approach (random.uniform(0, 1)), but squaring the probability.

Since P{x} ranges from 0 -> 1, P{x^2}also ranges from0 -> 1`.

But the weight is uneven, as a small number squared is still small, and a larger number squared becomes small.

Just a thought.

Blender
  • 289,723
  • 53
  • 439
  • 496
1

I think what you really want is function f : [1,n] → N (the natural numbers 0, 1, 2, ...). This would assign a weighting to each rank. Then you want to pick rank k with probability f(*k*) / (Σ f(*i*)), in other words, that rank's weight over the sum of the weights of all ranks. To do that, you can simply pick an integer uniformly at random over the interval [1, Σ f(*i*)] and figure out what rank you are in based on your position; if you are in 1 ... f(1), pick 1, if you are in f(1)+1 ... f(1)+f(2), pick 2, and so forth.

One possible choice for f which weights small ranks higher than big ranks is f(*i*) = n - i + 1. There are many other possible choices.

Martin Hock
  • 944
  • 4
  • 5
1

Generate K random numbers from the interval 1..n (K > 1) and choose the minimum!

This has the properties you want, check out a demonstration of the distributions at http://www.sjsu.edu/faculty/watkins/samplemin.htm

To make this work with fractional values of K (1 < K < 2) you can do it like this:

int m = random_int(1..n)
if (random_double(0..1) <= K - 1):
     m = min(m, random_int(1..n))

So when K approaches 1 from above the distribution becomes more and more flat.

Antti Huima
  • 25,136
  • 3
  • 52
  • 71
0

Normalize by rank, then build a binary tree. Select a random double and find the corresponding value.

dave
  • 12,406
  • 10
  • 42
  • 59