0

I am implementing a selection algorithm that selects an object based on a probability proportional to its score value. This makes higher-scoring objects more likely to be chosen.

My implementation is as follows:

var pool = [];
for (var i = 0; i < 100; i++)
    pool.push({ Id: i, Score: Math.random() * 100 << 0 });

const NUM_RUNS = 100000;
var start = new Date();
for( var i = 0; i < NUM_RUNS; i++ )
  rouletteSelection(pool);
var end = new Date();

var runningTime = (end.getTime() - start.getTime()) / 1000;
var avgExecutionTime = ( runningTime / NUM_RUNS ) * Math.pow(10,9);
console.log('Running Time: ' + runningTime + ' seconds');
console.log('Avg. Execution Time: ' + avgExecutionTime + ' nanoseconds');


function rouletteSelection(pool) {
        // Sum all scores and normalize by shifting range to minimum of 0
        var sumScore = 0, lowestScore = 0;
        pool.forEach((obj) => {
            sumScore += obj.Score;
            if (obj.Score < lowestScore)
                lowestScore = obj.Score;
        })
        sumScore += Math.abs(lowestScore * pool.length);
    
        var rouletteSum = 0;
        var random = Math.random() * sumScore << 0;
        for (var i in pool) {
            rouletteSum += pool[i].Score + lowestScore;
            if (random < rouletteSum)
                return pool[i];
        }
    
        //Failsafe
        console.warn('Could not choose roulette winner, selecting random');
        return pool[Math.random() * pool.length];
};

When run, the performance isn't bad: on my machine, each call to rouleteSelection() takes about 2500-3200 nanoseconds. However, before being used in production, I want to optimize this and shave off as much overhead as I can, as this function could be potentially called tens of millions of times in extreme cases.

An obvious optimization would be to somehow merge everything into a single loop so the object array is only iterated over once. The problem is, in order to normalize the scores (negative scores are shifted to 0), I need to know the lowest score to begin with.

Does anyone have any idea as to how to get around this?

w0f
  • 908
  • 9
  • 23
  • What is `.Fitness`, did you mean `.Score`? – Bergi Aug 02 '18 at 16:39
  • Also, [don't use `for…in` enumerations on arrays](https://stackoverflow.com/q/500504/1048572)! – Bergi Aug 02 '18 at 16:41
  • "*normalize by shifting range to minimum of 0*" - honestly, I would just throw an exception on negative values. They don't make sense. And shifting all values a) makes a real hassle b) distorts the proportions of the other objects – Bergi Aug 02 '18 at 16:45
  • Side note: your failsafe doesn't work: you need to round/truncate the random index. – jmrk Aug 02 '18 at 20:47

1 Answers1

0

At the risk of stating the obvious: just don't do the normalisation in every call to rouletteSelection. Do it once, after you constructed the pool.

Bergi
  • 630,263
  • 148
  • 957
  • 1,375
  • For my application, negative score values are necessary. I was, however, surprised that using range-based for loops cut execution time down by a factor of 5. On top of that, I shaved a few additional nanoseconds off with some clever use of averaging and estimating – w0f Aug 02 '18 at 17:46
  • Are your scores integers only? If so, you could properly optimise to a constant-time algorithm. And even when not, if you have a reasonable pool size, it might make sense to accumulate over the pool and then do a binary search for the interval that contains `random`. – Bergi Aug 02 '18 at 17:53
  • I'm testing on integers, but in production it will need to be able to handle floating point numbers too. The pool size will be constant after being initialized at startup. Not sure how to go about a binary search though. The roulette selection gives each object in the pool a probability of `obj.Score/sum_of_scores`. Do you mean creating a BST with the running sum as the key? – w0f Aug 02 '18 at 18:08
  • @w0f Yes, exactly that - accumulate the scores as the search key. With an [interpolation search](https://en.wikipedia.org/wiki/Interpolation_search) this should be quite fast - whether it's faster than the linear but simple approach depends on the pool size. – Bergi Aug 02 '18 at 18:19