0


I am learning Knuth shuffle and it says that the random number should be between 0 and i, not N-1 and I can not understand why.
I saw similar questions asked before but cannot find a clear explanation.
thanks for your attention.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49
Ali Abbasifard
  • 398
  • 4
  • 22
  • 1
    It's originally known as [the Fisher-Yates shuffle](https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle). Does reading more about it (and searching for "Fisher-Yates shuffle") give you more information? – Some programmer dude Aug 04 '22 at 08:36
  • 1
    And what resource are you currently reading or using? What does it say about `i` and `N`? What is `i` supposed to be? What is `N` supposed to be? Different texts and implementations can have different names, so knowing exactly what you're reading might help us to understand your problem better. – Some programmer dude Aug 04 '22 at 08:40
  • Yes, I read about it and can't find any explanation for my question. Resources only show how it works and tell nothing about the random number generated for swap. The resource which I am using is an algorithm course and online searches. In all of them, ```i``` refer to the index of the current item in the array and ```N``` is the length of the array. what I want to know is, why ```r``` should be a random number between 0 and ```i``` (index) and not ```N-1```(last index of the array)? @some-programmer-dude – Ali Abbasifard Aug 04 '22 at 09:19
  • 1
    Good question, although it's off topic here, better suited to cs.stackexchange.com. That said, I think the effect of choosing from 0 to N - 1 is that swaps can be undone from one step to the next, so the effect is to leave the original sequence unchanged. You could try it with a small number of items and generate many shuffled sequences. A shuffle algorithm should have all sequences having equal probability. If you try a lot of random swaps by the 0 to N - 1 method, do you find all sequences are equally probable? Are some more probable than others? These are questions for you to consider. – Robert Dodier Aug 04 '22 at 19:31

1 Answers1

2

The "true" Fisher-Yates shuffle looks like this:

for i from 0 to n - 1:
    pick index j uniformly from i to n - 1  # [i, n-1]
    swap array[i] and array[j]

You're proposing this variation:

for i from 0 to n - 1:
    pick index j uniformly from 0 to n - 1  # [0, n-1]
    swap array[i] and array[j]

There are a couple of ways to see why this modification will not produce a uniformly random permutation of the elements.

The first is a clever mathematical argument. The loop in the incorrect version of Fisher-Yates has n iterations. On each iteration, we pick one of n indices to swap with. This means that there are n choices for the index on the first iteration, n choices for the index on the second, ..., and n choices for the index on the last. Overall, that means that the number of different ways the algorithm can execute is n × n × ... × n = nn.

However, the number of different permutations of n elements is n!. In order for it to be possible for the broken shuffle to produce each of the n! permutations with equal probability, each permutation must be reachable by the same number of paths through the broken implementation (say, k ways). That would mean that k · n! = nn, meaning that nn must be a multiple of n!. However, for any n > 2, this isn't the case, and so the distribution cannot be uniform.

You can see this empirically. Here's some Python code that uses the broken shuffle 1,000,000 times on a three-element array and looks at the resulting frequencies of the different permutations:

from random import randint
from collections import defaultdict

def incorrect_shuffle(arr):
    for i in range(len(arr)):
        j = randint(0, len(arr) - 1)
        arr[i], arr[j] = arr[j], arr[i]

freq = defaultdict(int)
for i in range(1000000):
    sequence = ['a', 'b', 'c']
    incorrect_shuffle(sequence)
    freq["".join(sequence)] += 1

print(freq)

When I ran this, I got the following frequencies:

  • abc: 148455
  • acb: 185328
  • bac: 184830
  • bca: 185235
  • cab: 148017
  • cba: 148135

Notice that the permutations acb, bac, and bca are all much more likely than the other three.

For more about the sorts of distributions that arise from the broken shuffle, check out this earlier Stack Overflow question.

chepner
  • 497,756
  • 71
  • 530
  • 681
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065