-1

I've run into a problem: I have a list or array (IList) of elements that have a field (float Fitness). I need to efficiently choose N random unique elements depending on this variable: the bigger - the more likely it is to be chosen.

I searched on the internet, but the algorithms I found were rather unreliable.

The answer stated here seems to have a bigger probability at the beginning which I need to make sure to avoid.

-Edit-

For example I need to choose from objects with the values [-5, -3, 0, 1, 2.5] (negative values included).

Emil Terman
  • 526
  • 4
  • 22
  • 1
    The basic algorithm is to sum the values, and then draw a point from 0-sum(values) and an order for the items, and see which one it "intersects". – Reut Sharabani Feb 07 '18 at 10:31
  • How long is list (compared with N)? – MBo Feb 07 '18 at 10:57
  • @MBo up to 25%. – Emil Terman Feb 07 '18 at 11:00
  • The answer to the question you link to is absolutely correct; it has no bias. However, it does not apply to your question since you want the selection to be biased. What is missing from your question is a specification of whatvthe bias is: in your example, what is the desired probability of choosing each value? "More likely" doesn't say much. – rici Feb 07 '18 at 13:44

2 Answers2

0

The basic algorithm is to sum the values, and then draw a point from 0-sum(values) and an order for the items, and see which one it "intersects".

For the values [0.1, 0.2, 0.3] the "windows" [0-0.1, 0.1-0.3, 0.3-0.6] will look like this:

  1 23 456
 |-|--|---|
 |-*--*---|

And you draw a point [0-0.6] and see what window it hit on the axis.

Pseudo-python for this:

original_values = {val1, val2, ... valn}

# list is to order them, order doesn't matter outside this context.
values = list(original_values)

# limit
limit = sum(values)
draw = random() * limit

while true:
    candidate = values.pop()
    if candidate > draw:
         return candidate
    draw -= candidate
Reut Sharabani
  • 30,449
  • 6
  • 70
  • 88
  • Thanks for the quick reply, but the question is for N elements. Using the code above N times seems to be rather slower than what I imagined. The values can be negative. – Emil Terman Feb 07 '18 at 10:38
  • 1
    The code is pretty quick IMO... It is linear to the input. – Reut Sharabani Feb 07 '18 at 10:39
  • How do I apply this algorithm on negative values? I thought maybe I should add the |minimum value| to each element, but then the input is a list of constant objects... Do you see a more elegant solution? – Emil Terman Feb 07 '18 at 10:46
  • Negative values have no chance of being elected based on value. If you want you can use their **absolute value** :) – Reut Sharabani Feb 07 '18 at 10:48
  • What do you mean by using the absolute value? These objects can have negative, as well as positive values and each element has a chance to be chosen. – Emil Terman Feb 07 '18 at 10:52
  • My answer stands. A negative value has a negative chance of being elected (meaning no chance). It could be that you're after **absolute values** (if -5 has the same chance as 5). – Reut Sharabani Feb 07 '18 at 10:54
  • Perhaps it is possible to convert all values into positive with some function. The simplest: `valueforsumming = value - minvalue + someconst` – MBo Feb 07 '18 at 11:32
  • [This](https://paste.ubuntu.com/26535239/) is what I came up with, so far, but it looks very dirty and memory consuming, idk – Emil Terman Feb 07 '18 at 11:38
  • @EmilTerman Do it in less iterations. Draw N points on the axis, then move along it and get all windows that intersect. If multiple intersections happen (likely), run again with less remaining selections to be made.. – Reut Sharabani Feb 07 '18 at 11:43
0

So what shall those numbers represent?

Does 2.5 mean, that the probability to be chosen is twice as high than 1.25? Well - the negative values don't fit into that scheme.

I guess fitness means something like -5: very ill, 2.5: very fit. We have a range of 7.5 and could randomly pick an element, if we know how many candidates there are and if we have access by index.

Then, take a random number between -5 and 2.5 and see, if our number is lower than or equal to the candidates fitness. If so, the candidate is picked, else we repeat with step 1. I would say, that we then generate a new threshold to survive, because if we got an 2.5, but no candidate with that fitness remains, we would search infinitely.

The range of fitnesses has to be known for this, too.

fitnesses [-5, -3,  0,  1, 2.5]
rand -5     x   x   x   x   x 
     -2.5   -   -   x   x   x  
      0     -   -   x   x   x  
      2.5   -   -   -   -   x 

If every candidate shall be testet every round, and the -5 guy shall have a chance to survive, you have to stretch the interval of random numbers a bit, to give him a chance, for instance, from -6 to 3.

user unknown
  • 35,537
  • 11
  • 75
  • 121