Why isn't this shuffle algorithm biased

Question

My coworker and I are arguing about why the shuffle algorithm given in this list of JS tips & tricks doesn't produce biased results like the sort Jeff Atwood describes for naive shuffles.

The array shuffle code in the tips is:

list.sort(function() Math.random() - 0.5);

Jeff's naive shuffle code is:


for (int i = 0; i < cards.Length; i++)
{
  int n = rand.Next(cards.Length);
  Swap(ref cards[i], ref cards[n]);
}

I wrote this JS to test the shuffle:


var list = [1,2,3];
var result = {123:0,132:0,321:0,213:0,231:0,312:0};
function shuffle() { return Math.random() - 0.5; }
for (var i=0; i<60000000; i++) {
    result[ list.sort(shuffle).join('') ]++;
}

For which I get results (from Firefox 5) like:

Order   Count          %Diff True Avg
123      9997461       -0.0002539
132     10003451        0.0003451
213     10001507        0.0001507
231      9997563       -0.0002437
312      9995658       -0.0004342
321     10004360        0.000436

Presumably Array.sort is walking the list array and performing swaps of (adjacent) elements similar to Jeff's example. So why don't the results look biased?

I'm guessing because there's a 1/3 chance that `n == i` in your first algorithm, whereas there is a 1/2^32 (I think? whatever the precision of a float is in JS) that there won't be a swap using `.sort()` — , Jul 07 '11 at 21:16
Specifically, my argument is based on the assumption that Array.sort() works by choosing two elements and then deciding whether to swap them or not. If that assumption is correct, then there are 2^n (where n is the number of comparisons made) possible outcomes of the sort operation, which can not be mapped evenly to a 3-element array's 6 possible arrangements (or, indeed, any array with a number of arrangements that has a prime factor not equal to 2). There must always be at least two arrangements for which one is 1/2^n more likely than the other. — Asmor, Jul 08 '11 at 15:16
See [Is it correct to use JavaScript Array `.sort()` method for shuffling?](https://stackoverflow.com/q/962802/1048572) — Bergi, Jul 13 '22 at 22:53

score 2 · Accepted Answer · answered Jul 08 '11 at 15:39

I found the reason it appears unbiased.

Array.sort() not only returns the array, it changes the array itself. If we re-initialize the array for each loop, we get results like:

Which shows a very significant bias.

For those interested, here's the modified code:

var result = {123:0,132:0,321:0,213:0,231:0,312:0};
var iterations = 60000;
function shuffle() { 
    comparisons++;
    return Math.random() - 0.5;
}
for (var i=0; i<iterations; i++) {
    var list = [1,2,3];
    result[ list.sort(shuffle).join('') ]++;
}
console.log(result);

Jason Goemaat · Answer 2 · 2011-07-07T22:56:54.750

The problem with the naive shuffle is that the value might have already been swapped and you might swap it again later. Let's say you have three cards and you pick one truly at random for the first card. If you later can randomly swap that card with a latter one then you are taking away from the randomness of that first selection.

If the sort is quicksort, it continually splits the list about in half. The next iteration splits each of those groups into two groups randomly. This keeps going on until you are down to single cards, then you combine them all together. The difference is that you never take a card from the second randomly selected group and move it back to the first group.

The Knuth-Fisher-Yates shuffle is different than the naive shuffle because you only pick a card once. If you were picking random cards from a deck, would you put a card back and pick again? No, you take random cards one at a time. This is the first I've heard of it, but I've done something similar back in high school going from index 0 up. KFY is probably faster because I have an extra addition in the random statement.

for (int i = 0; i < cards.Length - 1; i++)
{
  int n = rand.Next(cards.Length - i) + i; // (i to cards.Length - 1)
  Swap(ref cards[i], ref cards[n]);
}

Don't think of it as swapping, think of it as selecting random cards from a deck. For each element in the array (except the last because there is only one left) you pick a random card out of all the remaining cards and lay it down forming a new stack of cards that are randomly shuffled. It doesn't matter that your remaining cards are no longer in the original order if you've done any swapping already, you are still picking one random card from all the remaining cards.

The random quicksort is like taking a stack of cards and randomly dividing them into two groups, then taking each group and randomly dividing it into two smaller groups, and on and on until you have individual cards then putting them back together.

cwallenpoole · Answer 3 · 2011-07-07T21:56:25.320

Actually, that doesn't implement his naïve random sort. His algorithm actually transposes array keys manually, while sort actively sorts a list.

sort uses quicksort or insertion sort (thanks to cwolves for pointing that out -- see comments) to do this (this will vary based on the implementation):

Is A bigger or smaller than B? Smaller? Decrement.
Is A bigger or smaller than C? Smaller? Decrement.
Is A bigger or smaller than D? Smaller? Insert A after D
Is B bigger or smaller than C? Smaller? Decrement.
Is B bigger or smaller than D? Smaller? Insert B after D and before A...

This means that your big O for the average case is O(n log n) and your big O for the worst case is O(n^2) for each loop iteration.

Meanwhile the Atwood naïve random sort is a simple:

Start at A. Find random value. Swap.
Go to B. Find random value. Swap.
Go to C. Find random value. Swap.

(Knuth-Fisher-Yates is almost the same, only backwards)

So his has a big for the worst case of O(n) and a big O for the average case of O(n).

Minor correction - Sort in some newer browsers doesn't always use quicksort. An insertion sort is often used if the array is below a certain size as it is faster. — , Jul 07 '11 at 21:53
The Insertion sort would basically be the Knuth-Fisher-Yates algorithm. — Jason Goemaat, Jul 07 '11 at 22:41

Why isn't this shuffle algorithm biased

3 Answers3

Linked