73

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:

For k = 1 to N
    Pick a random integer j from k to N
    Swap A[k] and A[j]

A common mistake that I've been told over and over again not to make is this:

For k = 1 to N
    Pick a random integer j from 1 to N
    Swap A[k] and A[j]

That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.

What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?

GEOCHET
  • 21,119
  • 15
  • 74
  • 98
templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 2
    Do you really want 1-based indices? – Svante Feb 27 '11 at 10:37
  • This sounds familiar. Was this discussed on SO within the past two months or was it on programmers.SE? – oosterwal Mar 15 '11 at 22:16
  • @oosterwal- I asked this question about three weeks ago and didn't get a good answer, so I put a Large Bounty on it to help spur some interest in it. Hopefully someone will be able to enlighten all of us! – templatetypedef Mar 15 '11 at 22:17
  • I don't have an answer (yet), but one thing I've noticed is that each card is most likely to be found in the position just behind where it started. Also, both the *first card* and the *last position* are evenly distributed - that is, the first card has equal probability to end up in any position, and every card has equal probability to end up in the last position. Any correct solution must have these characteristics. – BlueRaja - Danny Pflughoeft Mar 16 '11 at 22:38
  • 1
    @Svante: why not? Lot of languages, starting with Pascal which was often used to describe algorithms, and including Lua, has indices starting at 1. IIRC, Pascal allows to start array indices at any number, but defaults to 1. – PhiLho Jul 21 '11 at 04:57

10 Answers10

56

An Empirical Approach.

Let's implement the erroneous algorithm in Mathematica:

p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
   a = Range[p];
   For[k = 1, k <= p, k++, 
     i = RandomInteger[{1, p}];
     temp = a[[k]];
     a[[k]] = a[[i]];
     a[[i]] = temp
   ];
   AppendTo[s, a];
]  

Now get the number of times each integer is in each position:

r = SortBy[#, #[[1]] &] & /@ Tally /@ Transpose[s]  

Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:

For position 1 the freq distribution is:

enter image description here

For position 5 (middle)

enter image description here

And for position 10 (last):

enter image description here

and here you have the distribution for all positions plotted together:

enter image description here

Here you have a better statistics over 8 positions:

enter image description here

Some observations:

  • For all positions the probability of "1" is the same (1/n).
  • The probability matrix is symmetrical with respect to the big anti-diagonal
  • So, the probability for any number in the last position is also uniform (1/n)

You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).

The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:

enter image description here

For a 100x100 matrix:

enter image description here

Edit

Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.

h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)

Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )

Edit

Working out a little the general explicit calculation in @wnoise answer, we can get a little more info.

Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):

enter image description here

Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:

{{  1/n,    1/n      , ...},
 {... .., A007318, ....},
 {... .., ... ..., ..},
 ... ....,
 {A129687, ... ... ... ... ... ... ..},
 {A131084, A028326 ... ... ... ... ..},
 {A028326, A131084 , A129687 ... ....}}

You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/

Solving the general problem is more difficult, but I hope this is a start

Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
32

The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).

If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
PengOne
  • 48,188
  • 17
  • 130
  • 149
  • Persi Diaconis is awesome... though when he lectures he never looks at the people he's talking to. :-) – templatetypedef Mar 16 '11 at 03:04
  • 2
    Does the 1981 paper actually address this particular situation? I thought the problem as state was looking at the distribution of permutations of the form (1 a_1)(2 a_2)...(n a_n) where each a_i is chosen uniformly from 1..n. – mhum Mar 16 '11 at 03:56
  • Could you summarize the result for those of us who can't access the paper? – BlueRaja - Danny Pflughoeft Mar 17 '11 at 16:48
  • 1
    @mhum: I believe you are correct that it doesn't quite. While I don't have immediate access to the 1981 paper, the corresponding results in "Group Representations in Probability and Statistics" cover uniformly random transpositions, not these ones where the transpositions involve fixed elements. (They generalize nicely to uniformly random over any conjugacy class, but I can't see how to get them to directly apply here.) – wnoise Mar 18 '11 at 20:56
  • 1
    It's unfortunate that this got the automatic bounty, as it doesn't really answer the question... – BlueRaja - Danny Pflughoeft Mar 24 '11 at 16:12
  • 1
    I don't know how it did considering belisarius had a (deservedly) higher rated answer. – PengOne Mar 24 '11 at 20:51
  • 1
    @Peng Because I posted my answer before the bounty was started – Dr. belisarius May 08 '11 at 13:39
  • This doesn't answer the given question. – Markus Klyver May 26 '23 at 13:00
15

Let's say

  • a = 1/N
  • b = 1-a
  • Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
  • Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:

kappa_2

  • Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:

I_2

  • Ai is

Ai= bIi + aKi

Then,

B_n

But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:

solution matrix

For example, for N=4:

B_4

As a diagram for N = 500 (color levels are 100*probability):

B_500

The pattern is the same for all N>2:

  • The most probable ending position for k-th element is k-1.
  • The least probable ending position is k for k < N*ln(2), position 1 otherwise
Eelvex
  • 8,975
  • 26
  • 42
  • It's easy to calculate analytic results even for large Ns, but the expressions are too "messy" to include here. – Eelvex Mar 22 '11 at 00:12
  • This appears to be correct, but.. how did you come up with this? Is this the same as [wnoise's answer](http://stackoverflow.com/questions/5131341/5132404#5132404)? (sorry, I'm afraid I don't understand stochastic matrices..) – BlueRaja - Danny Pflughoeft Mar 22 '11 at 17:45
  • 1
    @EElvex I would to know how you calculated this. – Mike Bailey Jul 29 '11 at 02:43
13

I knew I had seen this question before...

" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.

As you may have already guessed, based on the answer by @belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:

enter image description here

Community
  • 1
  • 1
oosterwal
  • 1,479
  • 8
  • 16
  • Thanks for the link/picture, but all that this confirms is that you get something non-uniform. I was hoping more for an analytic solution to what the actual distribution is, though. – templatetypedef Mar 15 '11 at 22:55
  • 1
    Upvoted for sharing the Jeff Atwood link, which also describes a way of deriving the distribution - the broken shuffle has n^n equally likely choices of random numbers, mapping to n! outputs. I don't think you're going to get an analytical solution; just a numerical one for small values of n. – Chris Nash Mar 18 '11 at 20:27
9

What a lovely question! I wish I had a full answer.

Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.

We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.

We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.

In Mathematica I generated each step with

 step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n, 
                      {j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]

(I haven't found it documented anywhere, but the first matching rule is used.) The final transition matrix can be calculated with:

Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]

ListDensityPlot is a useful visualization tool.

Edit (by belisarius)

Just a confirmation. The following code gives the same matrix as in @Eelvex's answer:

step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n), 
                      {j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last@Table[r[4, i], {i, 1, 4}] // MatrixForm
Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
wnoise
  • 9,764
  • 37
  • 47
  • 1
    Sounds interesting but I didn't get what your probability distributions are *on* -- seems to me that each state in the Markov chain you're describing needs to the specify the order of the entire n elements (i.e. an n-element problem requires an (n!)-state Markov chain). Is that what you mean? Also not sure about your reasoning that the last element is equally likely to have come from anywhere -- that is true iff all n elements are uniformly randomly distributed after the 1st n-1 elements have been processed, and I don't believe that's the case (or at least I'd like to see a proof). – j_random_hacker Feb 28 '11 at 00:10
  • The states are the n slots. Entry i,j in a transition matrix is the chance of going from slot i to slot j. Turning a transition matrix into a distribution on "where element i ended up" is just picking out row i. The distribution for "where element j came from" is just picking out column j. This indeed doesn't give counts for the permutations, just for where elements end up. – wnoise Feb 28 '11 at 00:37
  • 1
    @j_random_hacker: The last operation swaps the last element with any element with equal probability. No matter the distribution prior to this, the last element is chosen at random from all of them. – wnoise Feb 28 '11 at 00:49
  • Thanks, after doing some algebra I understand your last point now. Regarding the Markov states: so you mean you are tracking the movement (= probabilities of being in each slot) of a *particular* element? (E.g. suppose initially the ith element was i. Then we could say that the column vector transpose([0, 0, 1, 0, ..., 0]) represents the initial probability distribution of the location of element 3, and that premultiplying this by the transition matrix corresponding to the 1st swap would give the probability distribution of the location of element 3 after this step... – j_random_hacker Feb 28 '11 at 01:12
  • @j_random_hacker: that's exactly right. The transition matrix tracks all of these probabilities simultaneously for all elements, but does not track correlations (which could be done by tracking the permutations directly). – wnoise Feb 28 '11 at 01:18
  • 1
    Ah good. I was halfway through writing another comment but I think I'm on the right page now. Basically the shuffle is uniformly random iff, for any element i, the result of multiplying together the n transition matrices followed by a column vector with 1 in row i and 0 elsewhere equals [1/n, 1/n, ..., 1/n]. That's equivalent to requiring that each column in the product of transition matrices equals that, which is equivalent to requiring that every single entry in the product matrix is 1/n. – j_random_hacker Feb 28 '11 at 01:53
3

Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.

Jeremiah Willcock
  • 30,161
  • 7
  • 76
  • 78
  • 2
    Thanks for the link, but part of the reason I asked this question is that the Wikipedia article just states that you aren't going to get a uniform distribution, not what that non-uniform distribution looks like mathematically. That is, there's no discussion of the probability of a particular element ending up in a particular place. – templatetypedef Feb 27 '11 at 04:04
  • @templatetypedef: There is a figure for that for a simple case (I believe 6 or 7 elements). I know it is not a fully general answer, though. – Jeremiah Willcock Feb 27 '11 at 04:06
3

You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.

The result of the complete shuffle is then given by the product of the matrices AN ... A1.

I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!

UPDATE: I just spotted wnoise's equivalent answer above! oops...

daoudc
  • 533
  • 2
  • 11
3

I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.

In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.

In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.

Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.

Ethan Bolker
  • 217
  • 1
  • 8
  • 19
PengOne
  • 48,188
  • 17
  • 130
  • 149
2

This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.

Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.

His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:

This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:

function shuffle(array) {
    var n = array.length, i = -1, j;
    while (++i < n) {
        j = Math.floor(Math.random() * n);
        t = array[j];
        array[j] = array[i];
        array[i] = t;
    }
}

biased shuffle

But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:

enter image description here

where the distribution is uniform, and is produced from code such as:

function FisherYatesDurstenfeldKnuthshuffle( array ) {
    var pickIndex, arrayPosition = array.length;
    while( --arrayPosition ) {
        pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
        array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
    }
}
Mac
  • 1,432
  • 21
  • 27
  • This would be a much better answer if you would include more information here and not hide it behind a link. – Teepeemm Oct 21 '15 at 14:39
  • I disagree. I saw no need to attempt to repeat the excellent replies already given by **daoudc**, **wnoise**, **Eelvex**, and especially **belisarius is forth**. All that was missing from the replies on this page was some kind of interactive model. The link provides it. – Mac Oct 22 '15 at 15:25
1

The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:

The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.

You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.

If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Matt
  • 25,467
  • 18
  • 120
  • 187
  • Do you think, you could replace the "here" by something more googleable? – Wolf Mar 09 '15 at 11:28
  • Done, I have added a google search hint - however, "here" was a link already. – Matt Mar 09 '15 at 12:35
  • That's the problem with *here* links: the intent may be obvious to the writer, but not to the reader (before following it). It's like pointing into a landscape saying *look there!* The more problematic thing is that sometimes web pages vanish, or entire sites are shut down (hopefully archived before): that's the time when a simple *here* becomes pointless. Nevertheless thanks for taking my suggestion into account. – Wolf Mar 09 '15 at 13:52
  • @Wolf: Good point, I didn't think about that before. You are right, if the content moves, the google search might still be helpful. Thank you for bringing this to my attention! – Matt Mar 09 '15 at 14:09