32

The Fisher-Yates shuffle gives a nice algorithm to shuffle an array A of length n in a single pass:

For k = 1 to n
    Pick a random integer j from k to n
    Swap A[k] and A[j]

After a single pass through this algorithm, the entries of A occur uniformly at random.

A common way to botch this algorithm is to do the following:

For k = 1 to n
    Pick a random integer j from 1 to n
    Swap A[k] and A[j]

The resulting distribution from a single pass through this algorithm is not uniformly random, and there is a nice discussion of what it is at this post: What distribution do you get from this broken random shuffle?

I recently read a delightful article by Diaconis, Fulman and Holmes entitled Analysis of Casino Shelf Shuffling Machines where the authors describe a physical machine that does the following batch shuffle:

For k = 1 to n
    Pick a random integer j from 1 to 10
    Randomly choose to place card k on the top or bottom of stack j

The question the authors address is whether or not this gives a reasonably random ordering after a single pass. The answer is decidedly not. One way to see the flaw in this shuffle is to start with a deck of cards that has n/2 red cards atop of n/2 black cards. The resulting deck after a single pass will have at most 10 clumps of red cards! For n = 52*6, this isn't terribly random. The authors also show that an optimal "guess the next card" strategy for the once shuffled will, on average, correctly guess 9.5 cards, whereas an optimal strategy for a random deck will average only 4.5 cards correctly guessed.

Are there any other interesting single-pass shuffles that achieve near-randomness and/or interesting distributions? I'm especially interested in shuffles similar to the latter that work with batches of entries.

Community
  • 1
  • 1
PengOne
  • 48,188
  • 17
  • 130
  • 149
  • 2
    incredibly interesting question, but I don't think its a good fit for SO – Mitch Wheat Jul 21 '11 at 04:30
  • @Mitch: Maybe math.stackexchange.com? Go ahead and close it if you think it's not a good fit. I was just hoping the algorithms gurus here might have seen some cool stuff. – PengOne Jul 21 '11 at 04:34
  • Yeah, I was thinking maybe math.stackexchange.com or cs.stackexchange.com – Mitch Wheat Jul 21 '11 at 04:37
  • @Peng I've some trouble trying to download your paper cited in your answer to the previous related question. Could you check if it is still available? Thanks! (http://www-stat.stanford.edu/~cgates/PERSI/redblack.pdf) – Dr. belisarius Jul 21 '11 at 04:39
  • 18
    @Mitch C'mon ... let's keep some interesting questions here! – Dr. belisarius Jul 21 '11 at 04:40
  • 1
    @PengOne: some of the shuffling theory also appears in "Proofs from THE BOOK" – Mitch Wheat Jul 21 '11 at 04:46
  • 1
    @Mitch Being eclectic was considered a virtue once :) – Dr. belisarius Jul 21 '11 at 04:56
  • 6
    Fisher-Yates is straightforward, probably correct, and once you understand it, pretty much the most intuitive way to do a shuffle. Do we really need more? :) In the physical world (and hence OT for SO), I do wonder what the fastest shuffle you can implement that gives a good distribution is, though. – Nick Johnson Jul 21 '11 at 11:39
  • 1
    Fisher-Yates might be hard to beat in terms of memory used and the theoretical O(n) performance will not get better :). In practice, however, it is not cache friendly and shuffling elements of e.g. a 80MByte integer array can take a while. – Andras Vass Jul 21 '11 at 22:09
  • 1
    @Andras: I'm not looking to beat Fisher-Yates. Quite the opposite, in fact. I'm looking for algorithms that get *close* to random by sorting in batches, similar to the last example I gave. – PengOne Jul 21 '11 at 22:11
  • @PengOne: I assumed that tackling the practical problem of F-Y efficiency might produce an interesting algorithm (although, as you have said it would be a random distribution, not just close to it). – Andras Vass Jul 22 '11 at 08:15
  • This _really_ borders on "not constructive", however you have scoped the question quite well. I'm CW'ing this now. Please keep in mind, the community is free to disagree with me. – Tim Post Jul 22 '11 at 15:47
  • 2
    @TimPost: I don't think this warrants CW as it's not a shopping list question. It is unfortunate that it _looks_ subjective because of "good" in the title, but "what's a good algorithm" is orders of magnitude different from "what's a good monitor". In this case, it can be quantified in terms of Big-O or Theta. – abcd Jul 22 '11 at 16:44
  • 1
    @yoda, it still solicits many answers to the same question, all of which might be technically correct ... and then it becomes an election of sorts. While answers can be _quantified_ in terms of Big-O notation, there really isn't a single correct answer in terms of what fits any given implementation. I did think for quite some time about the CW status before I converted this post. I didn't apply it due to the title. I'm not married to the decision, but for now, let's see how the answers go? :) – Tim Post Jul 22 '11 at 16:51
  • @TimPost: Alright. I was going to defend the question further, but then I saw the last line: _"Are there any other interesting single-pass shuffles... "_ I had missed that earlier. So yeah, you're right in marking it CW :) – abcd Jul 22 '11 at 16:56
  • I've voted to close too; not because I think programming-related math problems aren't a fit for stackoverflow, but because "What are some other cool XXX" questions are not a good fit. – BlueRaja - Danny Pflughoeft Jul 22 '11 at 21:52

1 Answers1

1

If you have a shuffled desk, into which you wish to shuffle a batch of new cards (and you know that none of the cards are duplicates), then I think the following is valid.

ForEach card in batch:
    gap = random(deck.size() + 1)  # choose a gap between cards, before first, or after last.
    deck.insertAt(gap,card)

Distribution

The distribution of random is uniform, and the order of the deck is unchanged, so still uniform. I think the result should be uniform. (My stats is too rusty to be sure).

Time

Assuming that insertAt is O(1) not O(N) - which depends upon the implementeation of deck - the whole routine is O(batch size) - which is the best you can hope for becuase you have to handle each card.

Ian
  • 1,941
  • 2
  • 20
  • 35