17

Possible Duplicate:
Randomize a List<T> in C#

I have a list which contains many thousands of FilePath's to locations of audio files, and was wondering which would be the most efficient way to "shuffle" a List?

Any help is greatly appreciated :)

Thank you

Community
  • 1
  • 1
  • 1
    Do you actually want the *most efficient possible solution* or do you want an *acceptably efficient solution*? Because there are algorithms that are even more efficient that Fischer-Yates provided you are willing to abandon certain nice properties, like lack of bias. (Not that Fischer-Yates as implemented below is unbiased; it is deeply biased.) – Eric Lippert Feb 20 '10 at 16:22
  • @Eric: Fischer-Yates _is_ unbiased. The implementation given below is incorrect, as you noted. Of course there are more efficient implementations if you are willing to have bias. For instance, do _nothing_ at all. I really don't get your point. The OP hasn't specified anything, and it is reasonable (IMO) to assume they are looking for a uniform shuffle. –  Feb 20 '10 at 19:06
  • Is that *really* reasonable? The shuffle algorithm in question is for media files. One might want to bias the shuffle towards more frequently repeating more highly rated songs. – Eric Lippert Feb 21 '10 at 00:59
  • @Eric: What is reasonable or not, completely depends on j-t-s's context (which we don't have), but, given the information in the question, yes, I would say it is reasonable to assume uniform. –  Feb 21 '10 at 01:30
  • @Eric: What I am after is the most efficient solution, although an acceptably-efficient solution would be good, too. Currently, one of my users has a library of 500, 000 audio files on their 8-year-old computer. And I figure if there's one of them, then there's likely to be more out there and I would like things to be as fast as possible. –  Feb 21 '10 at 03:05
  • 3
    Then I would solve your problem by solving a different problem. Why do you have to produce a shuffle of half a million files? The user is never, ever going to make it to the last file in the shuffle even if they sit there hitting "next" all day every day for months. That is, why pre-compute the entire shuffled order *at all*? Choose a few hundred at random (without replacement) and call it good. That's got to be not only faster but way, way more memory efficient than allocating an array of five hundred thousand file names and shuffling the entire array. – Eric Lippert Feb 21 '10 at 06:17
  • 2
    @j-t-s: Agree with Eric: Trying to shuffle the files before hand for such a huge size seems pointless (and very inefficient). A suggestion, different from Eric's: you could try keeping a list of say 10(or 50) last played files. For the next file, you can generate a random number/file (from 1 to 1/2 million) and if it is among the last 10(or 50) played, try getting a random number again. This should be sufficient for all practical purposes. –  Feb 21 '10 at 06:27
  • See this post: [http://stackoverflow.com/questions/273313/randomize-a-listt-in-c](http://stackoverflow.com/questions/273313/randomize-a-listt-in-c) – dugas Feb 20 '10 at 04:22

4 Answers4

13

Fisher-Yates Shuffle or as it is also known as, Knuth shuffle.

Adam
  • 15,537
  • 2
  • 42
  • 63
  • 2
    ...which is O(n), so you can't get any better than that. – Guffa Feb 20 '10 at 04:22
  • 3
    ... I just upvoted you because I liked your answer :) Dunno why it was down-voted? –  Feb 20 '10 at 04:47
  • 3
    btw, for faster shuffling, I would suggest you shuffle a list/array of integers (using whatever method you choose), and use that shuffled list/array as an index into the list of filenames. Swapping filenames could turn out be a bottleneck. –  Feb 20 '10 at 05:14
8

Here is a simple (yet effective) implementation of the Fischer-Yates/Knuth shuffle:

Random rnd = new Random();
for (int i = files.Length; i > 1; i--) {
  int pos = rnd.Next(i);
  var x = files[i - 1];
  files[i - 1] = files[pos];
  files[pos] = x;
}

Or a slight variation:

Random rnd = new Random();
for (int i = 1; i < files.Length; i++) {
  int pos = rnd.Next(i + 1);
  var x = files[i];
  files[i] = files[pos];
  files[pos] = x;
}

As this is an O(n) operation, it's the most efficient way of shuffling a list. As all items in the list has to have chance to be moved, it's not possible to shuffle a list more efficiently than O(n).

I made a small performance test by shuffling a million items a thousand times each using this method and the currently accepted answer (LINQ OrderBy), and this is about 15 times (!) faster.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • You are confusing the asymptotic bound with efficiency. Efficiency is defined as resources consumed per unit of work done. Asymptotic bound describes how resources consumed increase as the problem size increases. The "find a substring of length m in a string of length n" algorithm in the System.String class is O(nm) but it is *far* more *efficient* on *typical* problems than the O(n+m) algorithms that we could have implemented. To work out efficiency you have to consider the *likely cases*, not the asymptotic bounds. – Eric Lippert Feb 20 '10 at 16:25
  • I also note that your implementation of Fischer-Yates has a bias; it does not produce all possible shuffles with equal likelihood. This is probably not a problem for a music shuffling algorithm, but it is a problem if you were using this to shuffle a deck of cards for a game of poker; given my hand, I could quickly determine what everyone else had in their hand. – Eric Lippert Feb 20 '10 at 16:29
  • @Eric: Why do you think that the implementation has a bias? It gives each item the same chance to end up in each postion in the list. Also I have verified the implementation by doing millions of shuffles, and there is no noticable bias. – Guffa Feb 20 '10 at 17:19
  • Let x be the number of possible random orders produced by Random. Let y be the number of possible orderings of the shuffle. I think you'll find that if you work out what x and y are, x is way, way, way smaller than y in your implementation, and therefore the shuffle is biased. – Eric Lippert Feb 20 '10 at 17:38
  • @Eric: For a list of n items x = n! and y = n!, so I don't see how x can be way, way, way smaller than y. – Guffa Feb 20 '10 at 19:01
  • @Guffa: Eric is right, the implementation you have is not Fisher Yates, even though it looks like it. You need to start from the other end. –  Feb 20 '10 at 19:03
  • @Moron: You are right that it's actually not the Fisher Yates implementation, but it works just fine. In what sense do you mean that Eric is right? (I added the Fisher Yates algorithm also.) – Guffa Feb 20 '10 at 21:49
  • The bias doesn't come from the algorithm. The bias comes from the source of "randomness". x is actually 2^32: the number of possible seeds. And since some of those seeds are way more likely than others -- since the seed is based on the clock, and when people choose to run software is not evenly distributed over time -- that's even more bias. – Eric Lippert Feb 21 '10 at 01:02
  • @Eric: Oh, I see what you mean, the Random class has an upper limit of 2^32 different random sequences. Well, at least it's less biased than the sorting method, as that has the same limit in the Random generator and also has a bias for duplicate random values... – Guffa Feb 21 '10 at 01:20
  • 1
    @Guffa: Sorry I misread Eric's comment. All I meant to say was this is not Fischer Yates. As for the bias issue: as you had implemented it, we can _prove_ that if the random generator is uniform, then the shuffle you produce is uniform (or 'unbiased'). Sorry for causing confusion. –  Feb 21 '10 at 01:53
5

myList.OrderBy(Guid.NewGuid())

Esteban Araya
  • 29,284
  • 24
  • 107
  • 141
  • 2
    Some GUID generation algorithms generate monotonic values, so this may not give random results. However, something similar using another source of randomness (Probably random) would work. – heneryville Oct 03 '12 at 22:16
0

I added Jon Skeet's solution from this question to my Extensions library. I implemented methods that both take an external random number generator and create one using a default implementation (Random).

Community
  • 1
  • 1
tvanfosson
  • 524,688
  • 99
  • 697
  • 795