Best ways to get random items from an array in javascript

Question

Considering the performance, what's the best way to get random subset from an array?

Say we get an array with 90000 items, I wanna get 10000 random items from it.

One approach I'm thinking about is to get a random index from 0 to array.length and then remove the selected one from the original array by using Array.prototype.splice. Then get the next random item from the rest.

But the splice method will rearrange the index of all the items after the one we just selected and move them forward on step. Doesn't it affect the performance?

Items may duplicates, but what we select should not. Say we've selected index 0, then we should only look up the rest 1~89999.

No, I mean do you want to get a truly random selection from the items or a subset from a shuffled list. — Wolph, May 01 '14 at 10:06
Do you need this more than once? I.e. is the subset created in a loop, or for example in an eventhandler? — Teemu, May 01 '14 at 10:26
@Teemu make a helper function will be better, so we can use it in many scenarios not just in the callback of an event. — 牛さん, May 01 '14 at 10:30

score 3 · Answer 1 · answered May 01 '14 at 10:19

3

If you want a subset of the shuffled array, you do not need to shuffle the whole array. You can stop the classic fisher-yates shuffle when you have drawn your 10000 items, leaving the other 80000 indices untouched.

answered May 01 '14 at 10:19

Bergi

630,263
148
957
1,375

Nice! This sounds more reasonable. – 牛さん May 01 '14 at 10:21
(Of course, shuffling means mutating the array, and to prevent that the cloning would be `O(n)`) – Bergi May 01 '14 at 10:23

score 1 · Answer 2 · edited May 23 '17 at 11:53

1

I would first randomize the whole array then splice of a 10000 items.

How to randomize (shuffle) a JavaScript array? Explains a good way to randomize a array in javascript

edited May 23 '17 at 11:53

Community

1
1

answered May 01 '14 at 10:06

Marco de Jongh

5,270
3
17
29

1

This sounds good, but if the amount of the items are so large, shuffling the whole array also costs time. – 牛さん May 01 '14 at 10:12

Samuel Edwin Ward · Answer 3 · 2014-05-24T16:34:41.420

A reservoir sampling algorithm can do this.

Here's an attempt at implementing Knuth's "Algorithm S" from TAOCP Volume 2 Section 3.4.2:

function sample(source, size) {
    var chosen = 0,
        srcLen = source.length,
        result = new Array(size);

    for (var seen = 0; chosen < size; seen++) {
        var remainingInput = srcLen - seen,
            remainingOutput = size - chosen;

        if (remainingInput*Math.random() < remainingOutput) {
            result[chosen++] = source[seen];
        }
    }

    return result;
}

Basically it makes one pass over the input array, choosing or skipping items based on a function of a random number, the number of items remaining in the input, and the number of items remaining to be required in the output.

There are three potential problems with this code: 1. I may have mucked it up, 2. Knuth calls for a random number "between zero and one" and I'm not sure if this means the [0, 1) interval JavaScript provides or the fully closed or fully open interval, 3. it's vulnerable to PRNG bias.

The performance characteristics should be very good. It's O(srcLen). Most of the time we finish before going through the entire input. The input is accessed in order, which is a good thing if you are running your code on a computer that has a cache. We don't even waste any time reading or writing elements that don't ultimately end up in the output.

This version doesn't modify the input array. It is possible to write an in-place version, which might save some memory, but it probably wouldn't be much faster.

Best ways to get random items from an array in javascript

3 Answers3