Is there any difference between the following algorithms for shuffling arrays?

Question

My Java textbook says that you can use the following code to randomly shuffle any given array:

for(int i = myList.length-1; i >=0; i--)
{

    int j = (int)( Math.random() * (i+1) );
    double temp = myList[i];
    myList[i] = myList[j];
    myList[j] = temp;

}

Would the following code that I wrote would be equally efficient or valid?

for(int i = 0; i < myList.length; i++)
{

    int j = (int)( Math.random() * (myList.length) );
    double temp = myList[i];
    myList[i] = myList[j];
    myList[j] = temp;

}

I tested my code and it does shuffle the elements properly. Is there any reason to use the textbook's algorithm over this one?

And I'm not sure why people are downvoting the answers. Sorry to those who have posted. Is there any reason why they were downvoted? — Aleksandr Hovhannisyan, Sep 30 '16 at 19:39
Your code is wrong, even though it will appear correct (it does shuffle, just not uniformly), it suffers from the classic [bad-shuffle bias](http://stackoverflow.com/q/859253/555045) — harold, Sep 30 '16 at 19:59
@AleksandrH The one with the most votes is incorrect. (sigh) The most correct is harold's comment. — Peter Lawrey, Sep 30 '16 at 20:10
So if I understand correctly, because of the "unrandom" nature of pseudorandom number generators, there's a bias towards certain indices being picked? Is the book's algorithm preferred because it reduces the chances of this bias occurring by limiting the range of indices between 0 and i and then decrementing i? — Aleksandr Hovhannisyan, Sep 30 '16 at 20:13
@harold Thanks for the link, looks like a duplicate candidate with it. — Gassa, Sep 30 '16 at 20:34

score 4 · Accepted Answer · answered Sep 30 '16 at 20:22

Yes, they are in fact different.

The first algorithm is a variant of the classic Knuth Shuffle. For this algorithm, we can prove (e.g., by induction) that, if our random number generator (Math.random()) was an ideal one, it would generate every one of the n! (n factorial) possible permutations with equal probability.

The second algorithm does not have this property. For example, when n = 3, there are 3³ = 27 possible outcomes, and that does not divide evenly by 3! = 6, the number of possible permutations. Indeed, here are the probabilities of outcomes (programs generating statistics: 1 2):

[0, 1, 2] 4/27
[0, 2, 1] 5/27
[1, 0, 2] 5/27
[1, 2, 0] 5/27
[2, 0, 1] 4/27
[2, 1, 0] 4/27

For n = 4, the results are even more uneven, for example (programs generating statistics: 3 4):

[1, 0, 3, 2] has probability 15/256
[3, 0, 1, 2] has probability  8/256

As you can imagine, this is an undesired property if your permutation is supposed to be uniformly random.

Lastly, the fact that we usually use a pseudorandom number generator instead of a true random source does not invalidate any of the above. The defects of our random number generator, if any, are obviously unable to repair the damage at the later step - if we choose a non-uniform algorithm, that is.

For a more detailed explanation http://stackoverflow.com/questions/859253/why-does-this-simple-shuffle-algorithm-produce-biased-results-what-is-a-simple +1 all the same. — Peter Lawrey, Sep 30 '16 at 20:40

Peter Lawrey · Answer 2 · 2016-09-30T20:35:31.907

3

The first example is preferred as it ensures fairness in how the elements are randomly arranged. In the second example, the elements are random but not equally random.

The first example is based on an optimised version which doesn't use Math.random.

Random rand = ...
for(int i = myList.length-1; i > 0; i--) {
    int j = rand.nextInt(i+1);
    double temp = myList[i];
    myList[i] = myList[j];
    myList[j] = temp;
}

From Collections.shuffle

        for (int i=size; i>1; i--)
            swap(list, i-1, rnd.nextInt(i));

which is the same thing.

This can be quite a bit faster as it doesn't have to produce as much "randomness" which means less calculations. Generating a double is far more expensive than a small number say between 0 and 9.

However, the first example doesn't take advantage of this and calls Math.random() any way. It only matters if you use nextInt(n)

edited Sep 30 '16 at 20:35

answered Sep 30 '16 at 19:32

Peter Lawrey

525,659
79
751
1,130

So if I'm reading this correctly, in the first example, it's impossible for an element to be "shuffled" to the same index in the array. In the second example, it is possible for the element to be in it's original index. Doesn't this change the results drastically? – Preston Martin Sep 30 '16 at 19:43
1

@PrestonM `j < i + 1` so `j` can be `i` and no swap occurs. – Peter Lawrey Sep 30 '16 at 19:54
No idea why someone was going on a -1 spree before. Guess that's just SO being SO. – Aleksandr Hovhannisyan Sep 30 '16 at 20:41
2

@Gassa: As an aside, unless the answer is actively unhelpful consider **not** voting down. [Down-votes are not a substitute for communication and editing.](http://stackoverflow.com/help/privileges/vote-down). Your vote is your vote anyways. – displayName Sep 30 '16 at 20:53
@displayName I don't mind downvotes, but without a comment it's like saying I could help you improve your answer but I can't be bothered. +1 – Peter Lawrey Sep 30 '16 at 20:55
@displayName Thanks for the guidelines link. I downvoted exactly the answers which (wrongly IMO) stated "no, there is no difference", or did not, in my perception, answer the yes/no question in any way. By the guidelines, they are what I percieved as "an answer that is clearly and perhaps dangerously incorrect", so I still consider that the right way to go. In this particular case, I retracted the downvote after the edit (see the current first paragraph). – Gassa Sep 30 '16 at 20:59
@Gassa I appreciated your comment. – Peter Lawrey Sep 30 '16 at 21:03
@PeterLawrey Honestly, I opened the question, and there were like 3 of 4 "no, everything is fine" answers. And your answer which didn't address the problem as I saw it. So I panicked a bit. Perhaps a better action for a clear mind would be to find a duplicate as @ harold did, then comment on every answer with it. – Gassa Sep 30 '16 at 21:10
Hope y'all don't mind, but I'm not going to select an answer. I feel like all of you contributed a lot here, so it would be unfair to award points to one person over another. Thanks for clarifying all this, though! Who would've thought such a small block of code could have so many implications? :) – Aleksandr Hovhannisyan Sep 30 '16 at 21:46
@AleksandrH I suggest commenting/upvoting all the helpful answers – Peter Lawrey Oct 01 '16 at 08:55
One more question: if I wanted to shuffle a deck of cards, for example, then my algorithm would be acceptable because its okay for some cards to be shuffled more than others in real life, right? Our textbook gives an example algorithm for a deck of cards class and it uses my algorithm to shuffle that deck. – Aleksandr Hovhannisyan Oct 01 '16 at 15:21
@AleksandrH card shuffling isn't perfect or entirely random in real life. Your solution could be good enough but why not use the fair solution which can also be faster. – Peter Lawrey Oct 01 '16 at 15:23
@PeterLawrey Yep, makes sense. Thanks again – Aleksandr Hovhannisyan Oct 01 '16 at 18:16
1

@AleksandrH The duplicate link's answers cite Jeff Atwood's [The danger of Naivete](https://blog.codinghorror.com/the-danger-of-naivete/), and shuffling cards is the example there, too. As stated there, _if you shipped a real card game with a naïve shuffle, you'd have some serious exploits to deal with._ For example, if one implements naïve shuffle for an online casino, they should be prepared to lose money with it, not win, once the most cunning players discover the unevenness. – Gassa Oct 02 '16 at 16:37

displayName · Answer 3 · 2016-09-30T20:43:48.257

1

No, you cannot use your algorithm in place of the book's algorithm.

Explanation: Your book's algorithm starts at a current element, beginning from the last one, and then picks another element from all elements within [0, current] and then swaps them. This way a higher index element can never be touched again, but it may still end up getting swapped with itself (which is normal).

However, in your algorithm you are generating the random index to swap with from all possible indices between 0 and i - 1. Thus, a higher index element can be swapped back to its original location during the shuffle.

The following code is not the equivalent of your book's algorithm. It will not leave any element in place, which is possible in your book's algorithm's case:

for (int i = myList.length - 1; i > 0; i--) {
    int j = (int)(Math.random() * i);
    swap(myList, i, j);
}

private void swap(double[] myList, int i, int j) {
    double temp = myList[i];
    myList[i] = myList[j];
    myList[j] = temp;
}

edited Sep 30 '16 at 20:43

answered Sep 30 '16 at 20:00

displayName

13,888
8
60
75

3

This is not correct, the first algo can leave the element in the same place as does Collections.shuffle() which does the same thing. – Peter Lawrey Sep 30 '16 at 20:02
2

@PeterLawrey: You are right - an element can be left in place. But the first algorithm will not touch the shuffled element again, unlike the OP's algorithm. – displayName Sep 30 '16 at 20:05
3

Ah, I see what you're saying. So the important thing is not whether an element gets shuffled with itself, but rather that my algorithm may be unfair or skewed because some elements *get shuffled more than others* instead of there being a *uniform* shuffle? – Aleksandr Hovhannisyan Sep 30 '16 at 20:06
1

@AleksandrH yes, you want the shuffle to be fair ideally rather than just kinda random. – Peter Lawrey Sep 30 '16 at 20:09
@AleksandrH: I have updated the answer and added the code that I think should be correct. You understood rightly. You can replace book's algorithm with yours if they both behave the same. But as you see, they don't. And as you see, your algorithm is unfair. – displayName Sep 30 '16 at 20:13
This has the problem you were referring to `(int)(Math.random() * i);` will not produce `i` so elements will not be left in place. It's not as random as the code in the OP's question. In short, it is less correct. – Peter Lawrey Sep 30 '16 at 20:37
@PeterLawrey: Not having any element left in place is exactly what I'm attempting and that doesn't make the answer less correct. I'm not trying to give the algorithm that's same as book, but an algorithm that will truly shuffle the array. – displayName Sep 30 '16 at 20:41
@displayName it depends on whether you want to arrange them randomly and fairly or you want to bias the result to avoid not moving some elements. Note: you have a chance you won't move element 0. Collections.shuffle can leave elements in place. – Peter Lawrey Sep 30 '16 at 20:44
@PeterLawrey: Element 0 will definitely be moved and as I said, I'm not imitating any algorithm in my suggestion. – displayName Sep 30 '16 at 20:44

score 0 · Answer 4 · answered Sep 30 '16 at 19:32

0

Is there any reason to use the textbook's algorithm over this one?

No. Your code and the text book's both have no rocket science based on the value of the multiplier. The core part is the Math.random() function. The multiplier in your book is i+1 that has mathematically lower probability in getting a repeated j value whereas your code just has a bit higher probability of getting a repeated j value, but frankly speaking, it doesn't matter.

Although, your code would be slightly, really slightly, in fact negligibly faster as each time an addition operation is not performed.

answered Sep 30 '16 at 19:32

Eric B.

4,622
2
18
33

1

So the only difference is that the book's version limits the index of j to somewhere between 0 and the current i (what I'm assuming you meant by lower probability of getting repeated indices), whereas mine confines it to the entire array from 0 to length-1 (and hence may get repeated values)? Shouldn't a truly random shuffle use "my" algorithm, since each index should (in an ideally "random" shuffle) have an equal probability of getting picked? Edit: hmm, maybe not. Thoughts are appreciated. – Aleksandr Hovhannisyan Sep 30 '16 at 19:42
Yes, you are correct about the understanding of my answer. About a truly "random" shuffle, well a random number generating function is not random at all. Interesting read [here](http://programmers.stackexchange.com/questions/124233/why-is-it-impossible-to-produce-truly-random-numbers). – Eric B. Sep 30 '16 at 19:46

Is there any difference between the following algorithms for shuffling arrays?

4 Answers4