Exhaust list of elements randomly without sorting them randomly first

Question

If I have a list of 10K elements, and I want to randomly iterate through all of them, is there an algorithm that lets me access each element randomly, without just sorting them randomly first?

In other words, this would not be ideal:

const sorted = list
              .map(v => [math.random(), v])
              .sort((a,b) => a[0]- b[0]);

It would be nice to avoid the sort call and the mapping call. My only idea would be to store everything in a hashmap and access the hash keys randomly somehow? Although that's just coming back to the same problem, afaict.

Sorting algorithms worst case is O(n^2), trying to avoid that if list is long, that's all — Alexander Mills, Nov 08 '22 at 01:52
I think the term you're after is the "Fisher-Yates shuffle", e.g. https://stackoverflow.com/a/2450976/1358308. this is only O(n) so should be faster — Sam Mason, Nov 08 '22 at 11:55

Sam Mason · Answer 1 · 2022-11-09T11:52:36.687

1

Just been having a play with this and realised that the Fisher-Yates shuffle works well "on-line". For example, if you've got a large list you don't need to spend the time to shuffle the whole thing before you start iterating over items, or, equivalently, you might only need a few items out of a large list.

I didn't see a language tag in the question, so I'll pick Python.

from random import randint

def iterrand(a):
    """Iterate over items of a list in a random order.
    Additional items can be .append()ed arbitrarily at runtime."""
    for i, ai in enumerate(a):
        j = randint(i, len(a)-1)
        a[i], a[j] = a[j], ai
        yield a[i]

This is O(n) in the length of the list and by allowing .append()s (O(1) in Python) the list can be built in the background.

An example use would be:

l = [0, 1, 2]

for i, v in enumerate(iterrand(l)):
    print(f"{i:3}: {v:<5} {l}")
    if v < 4:
        l.append(randint(1, 9))

which might produce output like:

  0: 2     [2, 1, 0]
  1: 3     [2, 3, 0, 1]
  2: 1     [2, 3, 1, 1, 0]
  3: 0     [2, 3, 1, 0, 1, 3]
  4: 1     [2, 3, 1, 0, 1, 3, 7]
  5: 7     [2, 3, 1, 0, 1, 7, 7, 3]
  6: 7     [2, 3, 1, 0, 1, 7, 7, 3]
  7: 3     [2, 3, 1, 0, 1, 7, 7, 3]
  8: 2     [2, 3, 1, 0, 1, 7, 7, 3, 2]
  9: 3     [2, 3, 1, 0, 1, 7, 7, 3, 2, 3]
 10: 2     [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2]
 11: 7     [2, 3, 1, 0, 1, 7, 7, 3, 2, 3, 2, 7]

Update: To test correctness, I'd do something like:

# trivial tests
assert list(iterrand([])) == []
assert list(iterrand([1])) == [1]

# bigger uniformity test

from collections import Counter

# tally 1M draws
c = Counter()
for _ in range(10**6):
    c[tuple(iterrand([1, 2, 3, 4, 5]))] += 1

# ensure it's uniform
assert all(7945 < v < 8728 for v in c.values())
# above constants calculated in R via:
#   k<-120;p<-0.001/k;qbinom(c(p,1-p), 1e6, 1/k))

edited Nov 09 '22 at 11:52

answered Nov 08 '22 at 13:48

Sam Mason

15,216
1
41
60

fisher yates is o(n) hmmm link? – Alexander Mills Nov 08 '22 at 18:17
2

@AlexanderMills see the wikipedia link, i.e. the first one. it's also kind of obvious from the code no? single loop doing constant work. or am I missing something obvious? – Sam Mason Nov 08 '22 at 18:20
1

@SamMason Why not just use python’s own `shuffle()` method, which should be faster than a native python implementation of FY? (Since native functions are implemented in C). – pjs Nov 08 '22 at 20:11
I wonder how they can prove that Fisher-Yates is non-biased or what not? curious about that – Alexander Mills Nov 09 '22 at 05:22
@pjs [CPython's 3.11 implementation](https://github.com/python/cpython/blob/cf5dbb47a2aab02f65c5b25ee1b4886e1b3dfd16/Lib/random.py#L373) is very similar and also just in Python. OP didn't specify a language, so I just picked Python due to my familiarity. An on-line version doesn't seem that relevant for only 10k items, but played for a bit and thought it was neat. – Sam Mason Nov 09 '22 at 11:22
1

@AlexanderMills searching for "Yates-Fisher proof" returns lots of relevant things on the correctness of the algorithm, e.g. [here](https://cs.stackexchange.com/q/2152/113260). when testing the implementation, I'd do some trivial unit tests and a simple uniformity test – Sam Mason Nov 09 '22 at 11:45
1

@AlexanderMills Unbiasedness of Fisher-Yates is a straightforward application of conditional probability. It’s the same argument as to why [drawing straws is fair](https://math.stackexchange.com/q/1475654/143176). – pjs Nov 09 '22 at 16:37
well it doesn't let you draw the same straw/card twice, isn't that so? maybe that's definitional..however that seems biased. but our goal here is not to draw the same item again, so we def meet our goal. it's biased against previously drawn cards, but that's what sampling without replacement is. so we are good. – Alexander Mills Nov 09 '22 at 16:43
@SamMason Thanks for the link to CPython’s `shuffle`, I learned something. – pjs Nov 09 '22 at 16:43
wait are you sure this is unbiased? one reason it wouldn't be: would the *first* element in in the unshuffled list ever appear *first* in the shuffled list? I don't see why it would. Or perhaps it could be swapped with *itself* on the very first iteration, so nvm :) – Alexander Mills Nov 12 '22 at 01:07

Alexander Mills · Accepted Answer · 2022-11-09T18:08:15.917

Fisher-Yates should do the trick as good as any, this article is really good: https://medium.com/@oldwestaction/randomness-is-hard-e085decbcbb2

the relevant JS code is very short and sweet:

const fisherYatesShuffle = (deck) => {
  for (let i = deck.length - 1; i >= 0; i--) {
    const swapIndex = Math.floor(Math.random() * (i + 1));
    [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
  }
  return deck
}

to yield results as you go, so you don't have to iterate through the list twice, use generator function like so:

const fisherYatesShuffle = function* (deck) {
    for (let i = deck.length - 1; i >= 0; i--) {
        const swapIndex = Math.floor(Math.random() * (i + 1)); // * use ;
        [deck[i], deck[swapIndex]] = [deck[swapIndex], deck[i]];
        yield deck[i];
    }
};

(note don't forget some of those semi-colons, when the next line is bracket notation).

I found [Mike Bostock's animation](https://bost.ocks.org/mike/shuffle/) to be a nice visual explanation, via my originally linked answer — Sam Mason, Nov 09 '22 at 11:55

Exhaust list of elements randomly without sorting them randomly first

2 Answers2