17

Collections.shuffle() goes through each index of a Collection backwards and then swaps it with a random index including or before it. I was wondering why, so I tried doing the same thing but swapping with any random index in the Collection.

Here is the shuffling part of the Collections.shuffle() code:

for (int i=size; i>1; i--)
    swap(arr, i-1, rnd.nextInt(i));

And here is my algorithm:

Random r = new Random();
for (int i = 0; i < a.size(); i++) {
    int index = r.nextInt(a.size());
    int temp = a.get(i);
    a.set(i, a.get(index));
    a.set(index, temp);
}

I found that Collections.shuffle() was much more evenly distributed than my code when I ran both on the same ArrayList one million times. Also, when running my code on:

[0, 1, 2, 3, 4]

it seems that the following permutations consistently occur most frequently:

[1, 0, 3, 4, 2]
[1, 2, 3, 4, 0]
[1, 2, 0, 4, 3]
[0, 2, 3, 4, 1]
[1, 2, 3, 0, 4]

Could someone please explain why?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
czhao
  • 305
  • 2
  • 7
  • 1
    by the way your post was edited on purpose to remove salutations like 'Thanks so much!' to fit the Question and Answer format, so it be good to remove it. Reason being it clutters the question up. See this: http://meta.stackexchange.com/questions/2950/should-hi-thanks-taglines-and-salutations-be-removed-from-posts – matrixanomaly Apr 06 '15 at 14:42
  • This has been asked dozens of times before, even the 'related' sidebar shows several duplicates. See for example http://stackoverflow.com/questions/6890356 http://stackoverflow.com/questions/29173181 http://stackoverflow.com/questions/6637472 http://stackoverflow.com/questions/12102877 – BlueRaja - Danny Pflughoeft Apr 06 '15 at 18:29

2 Answers2

38

Collections.Shuffle() does a Fisher-Yates shuffle. It's a more evenly distributed form of shuffling, and does not reshuffle what might have previously been shuffled already, as opposed to your algorithm.

What your algorithm does (also the known as the naive implementation) is that it will randomly pick any array index and shuffle it over, meaning there's a high chance of it picking the same index that has previously been shuffled already.

The Fisher-Yates shuffle (also known as the Donald Knuth Shuffle) is an unbiased algorithm that shuffles items in the array in an equally likely probability. It avoids the chance if it 'moving' the same objects twice.

Here's a good explanation of the Fisher Yates Shuffle by our own Jeff Atwood at Coding Horror, he discusses the naive implementation of the random shuffle versus the Fisher Yates shuffle.

See also this SO question about Java's implementation. It mentions what you asked. You can also look at the source code if you'd like, as mentioned there. I found it by Googling Collections.shuffle() in the top 5 links.

To further discuss this, it's always a good idea to use the Fisher-Yates shuffle compared to the naive implementations, especially in implementations that require a higher level of randomness (such as shuffle poker cards) to avoid introducing odds and unfair play. It wouldn't be a good thing if games of chance where implemented based on our naive implementation, as the bias leads to what you've observed, where the same permutation seems to show up more often than the others.

Lastly, as user @jmruc mentioned, here's a very nice tutorial on visualizing algorithms, it contains the Fisher-Yates shuffle, as well as other algorithms, all beautifully presented. Might help you wrap your head around the concepts if you're more of a visualizer: Visualizing Algorithms by Mike Bostock

matrixanomaly
  • 6,627
  • 2
  • 35
  • 58
  • 10
    Here is a [visualization](http://bost.ocks.org/mike/algorithms/#shuffling) of the algorithm, together with explanation of why it's better than "random" shuffle. – jmruc Apr 06 '15 at 14:45
  • @jmruc was just going to look for that site I stumbled upon a long time ago. Can I add it to my answer, or should I leave it as a comment? – matrixanomaly Apr 06 '15 at 14:46
3

This is another explanation of Fisher-Yates.

Consider the following method:

  1. There are two lists, A and B. Initially, all the elements are on list A so list B is empty.
  2. At each step:

    Pick with uniform probability from the elements currently on list A.

    Permute list A to make the chosen element the last element.

    Remove the last element from list A and append it to list B.

  3. When list A is empty, return list B.

I find this algorithm easy to understand and visualize.

The probability of a given item being chosen on the first step is 1/n. The probability of a given item being chosen on the second step is its probability of not being chosen on the first step, (n-1)/n, times its probability of being chosen on the second step given it is still on list A, 1/(n-1). That product is 1/n.

Similarly, it has probability ((n-1)/n)*((n-2)/(n-1)) = (n-2)/n of still being on list A after two items have been moved, and therefore a 1/n probability of being the third item chosen.

In general, the probability of still being on list A after k items have been chosen is ((n-1)/n)*((n-2)/(n-1))*...*((n-k)/(n-k+1)) = (n-k)/n. The probability of being chosen on the next step, given the item is still on list A, is 1/(n-k), so the unconditional probability being chosen on the step when list A has (n-k) items is ((n-k)/n)*(1/(n-k)) = 1/n.

Fisher-Yates is just this algorithm with the two lists, whose total length is always n, concatenated in a single array. At each step, it selects an element from list A with uniform probability, permutes list A to put that element adjacent to list B, and then moves the boundary so that it changes from being a list A element to being the most recently added element of list B.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75