177

I have read an article about various shuffle algorithms over at Coding Horror. I have seen that somewhere people have done this to shuffle a list:

var r = new Random();
var shuffled = ordered.OrderBy(x => r.Next());

Is this a good shuffle algorithm? How does it work exactly? Is it an acceptable way of doing this?

Rodrigo Guedes
  • 1,169
  • 2
  • 13
  • 27
Svish
  • 152,914
  • 173
  • 462
  • 620

13 Answers13

222

It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.

I prefer Durstenfeld's variant of the Fisher-Yates shuffle which swaps elements.

Implementing a simple Shuffle extension method would basically consist of calling ToList or ToArray on the input then using an existing implementation of Fisher-Yates. (Pass in the Random as a parameter to make life generally nicer.) There are plenty of implementations around... I've probably got one in an answer somewhere.

The nice thing about such an extension method is that it would then be very clear to the reader what you're actually trying to do.

EDIT: Here's a simple implementation (no error checking!):

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    // Note i > 0 to avoid final pointless iteration
    for (int i = elements.Length-1; i > 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
        int swapIndex = rng.Next(i + 1);
        T tmp = elements[i];
        elements[i] = elements[swapIndex];
        elements[swapIndex] = tmp;
    }
    // Lazily yield (avoiding aliasing issues etc)
    foreach (T element in elements)
    {
        yield return element;
    }
}

EDIT: Comments on performance below reminded me that we can actually return the elements as we shuffle them:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    for (int i = elements.Length - 1; i >= 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
        // ... except we don't really need to swap it fully, as we can
        // return it immediately, and afterwards it's irrelevant.
        int swapIndex = rng.Next(i + 1);
        yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
    }
}

This will now only do as much work as it needs to.

Note that in both cases, you need to be careful about the instance of Random you use as:

  • Creating two instances of Random at roughly the same time will yield the same sequence of random numbers (when used in the same way)
  • Random isn't thread-safe.

I have an article on Random which goes into more detail on these issues and provides solutions.

NoonKnight
  • 153
  • 5
  • 8
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 5
    Well, implementations for small, but important, things like this I would say is always nice to find here on StackOverflow. So yes please, if you want to =) – Svish Aug 17 '09 at 12:18
  • 9
    Jon -- your explanation of Fisher-Yates is equivalent to the implementation given in the question (the naive version). Durstenfeld/Knuth achieve O(n) not by assignment, but by selection from a decreasing set and swapping. This way the random number selected may repeat and the algorithm only takes O(n). – tvanfosson Aug 17 '09 at 12:18
  • Actually it's better than the naive version, but would still be O(n log n). – tvanfosson Aug 17 '09 at 12:20
  • @tvanfosson: I've always assumed the "swapping" version, which is O(n). – Jon Skeet Aug 17 '09 at 12:35
  • (I hadn't even thought of not doing the swapping version in fact! Will edit accordingly.) – Jon Skeet Aug 17 '09 at 12:39
  • @Svish: If you return an array (as `IEnumerable` then store it in two places, then one place could cast back to an array and change what the other place "sees". – Jon Skeet Aug 17 '09 at 13:06
  • @Jon -- I think a better way to describe it is that it randomly assigns each element to a position in a new array rather than assigning a random number to each element and ordering by that number. Nice implementation though -- I actually have a use coming up for it -- part of a data anonymization effort. – tvanfosson Aug 17 '09 at 14:14
  • @tvanfosson: I'm describing what the code in the question *does*. The overall effect is to shuffle, but *how it works* is to assign a random number to each element and then sort. On the other hand, maybe I've confused things with the order of the paragraphs... I suspect so! – Jon Skeet Aug 17 '09 at 14:40
  • @Jon -- thanks for the clarification. I was, in fact, assuming you were describing your algorithm. – tvanfosson Aug 17 '09 at 14:47
  • BTW - you can get by with a little less arithmetic if you decrement from the end to the beginning. – tvanfosson Aug 17 '09 at 15:09
  • 8
    You're probably getting sick of hearing from me on this, but I ran into a slight problem in my unit tests that you might want to be aware of. There's a quirk with ElementAt that makes it invoke the extension each time, giving unreliable results. In my tests I'm materializing the result before checking to avoid this. – tvanfosson Aug 17 '09 at 16:44
  • 3
    @tvanfosson: Not sick at all :) But yes, callers should be aware that it's lazily evaluated. – Jon Skeet Aug 17 '09 at 17:04
  • 5
    A bit late, but please note `source.ToArray();` requires you to have `using System.Linq;` in the same file. If you don't, you get this error: `'System.Collections.Generic.IEnumerable' does not contain a definition for 'ToArray' and no extension method 'ToArray' accepting a first argument of type 'System.Collections.Generic.IEnumerable' could be found (are you missing a using directive or an assembly reference?)` – Powerlord Oct 22 '09 at 20:41
  • I know this is old... Having the yield return is x2 slower than just returning the shuffled array, I don't get the alias issue, is there a reference I can read up on the reason to use it? Thanks, I hope this isn't a bother. – Chuck Savage Jan 20 '12 at 02:32
  • @ChuckSavage: For one thing, it wouldn't use deferred execution any more, so it would go against everything else in LINQ. For another, it would allow the caller to cast it back to the array type and mess things up. In general, LINQ operators hide any underlying mutable data sources. However, I've now included another implementation which is lazier - now if you only take (say) the first 3 elements of a shuffled million-element array, it will only do the work of shuffling 3 elements. – Jon Skeet Jan 20 '12 at 07:53
  • @JonSkeet - I think you need `if(elements.Length > 0) yield return elements[0];` after the for loop there; otherwise there's always one too few in the output list (or none at all if `input.Count() == 1`) (far be it from me to contradict any of the Jon Skeet Facts! http://meta.stackexchange.com/questions/9134/jon-skeet-facts) – Andras Zoltan Feb 03 '12 at 08:27
  • @AndrasZoltan: Thanks - I've just changed the iteration count instead, so it keeps going to i == 0. It'll call `Random.Next(1)` pointlessly at that point, but the code is cleaner :) – Jon Skeet Feb 03 '12 at 08:29
  • using Random over multiple shuffles ensure that shuffles are going to be different, but I wonder how can one ensure robustness of a single shuffle in that solution. – Khaled.K Mar 15 '15 at 09:26
  • @Khaled: No, using random over multiple shuffles doesn't guarantee uniqueness at all. There's always a chance of a shuffle doing nothing - there has to be. – Jon Skeet Mar 15 '15 at 09:27
  • My client's robo-scanner doesn't like Random, but it was easy enough to swap out an `RNGCryptoServiceProvider` (although slower, solves the thread-safety issue and shuts-up any robot that reports any instance of Random as a security flaw). I'll post an answer. – frattaro Jul 05 '16 at 17:36
  • @JonSkeet Can you explain what the issue is with including the random generator in the method itself. Why do we need it as a param? eg would this work: `public static IEnumerable Shuffle(this IEnumerable source) { var rng = new Random(); T[] elements = source.ToArray(); for (int i = elements.Length - 1; i >= 0; i--) { int swapIndex = rng.Next(i + 1); yield return elements[swapIndex]; elements[swapIndex] = elements[i]; }` – userSteve Sep 18 '22 at 09:03
  • @userSteve: If you call that twice in quick succession with the same input, you may very well get the same results, because the default seed is time-based. (And fundamentally there are also times when you *want* to provide a consistent input for repeatability.) The aspect of "provide a random number generator" is quite distinct from "use a random number generator to return a shuffled sequence of elements". – Jon Skeet Sep 18 '22 at 14:56
73

This is based on Jon Skeet's answer.

In that answer, the array is shuffled, then returned using yield. The net result is that the array is kept in memory for the duration of foreach, as well as objects necessary for iteration, and yet the cost is all at the beginning - the yield is basically an empty loop.

This algorithm is used a lot in games, where the first three items are picked, and the others will only be needed later if at all. My suggestion is to yield the numbers as soon as they are swapped. This will reduce the start-up cost, while keeping the iteration cost at O(1) (basically 5 operations per iteration). The total cost would remain the same, but the shuffling itself would be quicker. In cases where this is called as collection.Shuffle().ToArray() it will theoretically make no difference, but in the aforementioned use cases it will speed start-up. Also, this would make the algorithm useful for cases where you only need a few unique items. For example, if you need to pull out three cards from a deck of 52, you can call deck.Shuffle().Take(3) and only three swaps will take place (although the entire array would have to be copied first).

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
    T[] elements = source.ToArray();
    // Note i > 0 to avoid final pointless iteration
    for (int i = elements.Length - 1; i > 0; i--)
    {
        // Swap element "i" with a random earlier element it (or itself)
        int swapIndex = rng.Next(i + 1);
        yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
        // we don't actually perform the swap, we can forget about the
        // swapped element because we already returned it.
    }

    // there is one item remaining that was not returned - we return it now
    yield return elements[0]; 
}
Community
  • 1
  • 1
configurator
  • 40,828
  • 14
  • 81
  • 115
  • Ouch! This will likely not return all the items in the source. You can't rely on a random number being unique for N iterations. – P Daddy Nov 03 '09 at 13:52
  • @P Daddy: Huh? Care to elaborate? – Svish Nov 03 '09 at 13:56
  • @Svish: An extreme example: `rng.Next(i + 1)` *could* return zero every time, just like a flipped quarter could come up heads 15 times in a row. Although it won't likely actually come up zero N times in a row, *some* number of repeats is very likely, so the chances of complete coverage are rather low. – P Daddy Nov 03 '09 at 14:01
  • Wait a minute. Never mind. I wasn't paying enough attention. The half-swap takes care of repeated random numbers. If it came up zero every time it would just effectively reverse all but the first element. – P Daddy Nov 03 '09 at 14:10
  • There is still a minor problem, though. This returns only N-1 elements. If you have 10 elements in the source, this returns only 9 of them. Add `yield return elements[0];` after the `for` loop to correct it. – P Daddy Nov 03 '09 at 14:13
  • 1
    Or you could replace the > 0 with >= 0 and not have to (although, an extra RNG hit plus a redundant assignment) – FryGuy Nov 29 '09 at 20:40
  • 4
    The startup cost is O(N) as the cost of source.ToArray(); – Dave Hillier Nov 08 '11 at 10:50
11

Starting from this quote of Skeet:

It's not a way of shuffling that I like, mostly on the grounds that it's O(n log n) for no good reason when it's easy to implement an O(n) shuffle. The code in the question "works" by basically giving a random (hopefully unique!) number to each element, then ordering the elements according to that number.

I'll go on a little explaining the reason for the hopefully unique!

Now, from the Enumerable.OrderBy:

This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved

This is very important! What happens if two elements "receive" the same random number? It happens that they remain in the same order they are in the array. Now, what is the possibility for this to happen? It is difficult to calculate exactly, but there is the Birthday Problem that is exactly this problem.

Now, is it real? Is it true?

As always, when in doubt, write some lines of program: http://pastebin.com/5CDnUxPG

This little block of code shuffles an array of 3 elements a certain number of times using the Fisher-Yates algorithm done backward, the Fisher-Yates algorithm done forward (in the wiki page there are two pseudo-code algorithms... They produce equivalent results, but one is done from first to last element, while the other is done from last to first element), the naive wrong algorithm of http://blog.codinghorror.com/the-danger-of-naivete/ and using the .OrderBy(x => r.Next()) and the .OrderBy(x => r.Next(someValue)).

Now, Random.Next is

A 32-bit signed integer that is greater than or equal to 0 and less than MaxValue.

so it's equivalent to

OrderBy(x => r.Next(int.MaxValue))

To test if this problem exists, we could enlarge the array (something very slow) or simply reduce the maximum value of the random number generator (int.MaxValue isn't a "special" number... It is simply a very big number). In the end, if the algorithm isn't biased by the stableness of the OrderBy, then any range of values should give the same result.

The program then tests some values, in the range 1...4096. Looking at the result, it's quite clear that for low values (< 128), the algorithm is very biased (4-8%). With 3 values you need at least r.Next(1024). If you make the array bigger (4 or 5), then even r.Next(1024) isn't enough. I'm not an expert in shuffling and in math, but I think that for each extra bit of length of the array, you need 2 extra bits of maximum value (because the birthday paradox is connected to the sqrt(numvalues)), so that if the maximum value is 2^31, I'll say that you should be able to sort arrays up to 2^12/2^13 bits (4096-8192 elements)

xanatos
  • 109,618
  • 12
  • 197
  • 280
6

It's probablly ok for most purposes, and almost always it generates a truly random distribution (except when Random.Next() produces two identical random integers).

It works by assigning each element of the series a random integer, then ordering the sequence by these integers.

It's totally acceptable for 99.9% of the applications (unless you absolutely need to handle the edge case above). Also, skeet's objection to its runtime is valid, so if you're shuffling a long list you might not want to use it.

ripper234
  • 222,824
  • 274
  • 634
  • 905
5

This has come up many times before. Search for Fisher-Yates on StackOverflow.

Here is a C# code sample I wrote for this algorithm. You can parameterize it on some other type, if you prefer.

static public class FisherYates
{
        //      Based on Java code from wikipedia:
        //      http://en.wikipedia.org/wiki/Fisher-Yates_shuffle
        static public void Shuffle(int[] deck)
        {
                Random r = new Random();
                for (int n = deck.Length - 1; n > 0; --n)
                {
                        int k = r.Next(n+1);
                        int temp = deck[n];
                        deck[n] = deck[k];
                        deck[k] = temp;
                }
        }
}
Community
  • 1
  • 1
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
  • 2
    You shouldn't use `Random` as a static variable like this - `Random` isn't thread-safe. See http://csharpindepth.com/Articles/Chapter12/Random.aspx – Jon Skeet Dec 30 '12 at 22:11
  • @Jon Skeet: sure, that's a legitimate argument. OTOH, the OP was asking about an algorithm that was flat-out wrong whereas this is correct (other than the multithreaded card-shuffling use-case). – hughdbrown Jan 02 '13 at 17:41
  • 1
    That just means this is "less wrong" than the OP's approach. It doesn't mean it's code which should be used without understanding that it can't be used safely in a multi-threaded context... which is something you didn't mention. There's a reasonable *expectation* that static members can be used safely from multiple threads. – Jon Skeet Jan 02 '13 at 17:46
  • @Jon Skeet: Sure, I can change it. Done. I tend to think that coming back to a question answered three and a half years ago and saying, "It's incorrect because it doesn't handle the multithreaded use-case" when the OP never asked about anything more than the algorithm is excessive. Review my answers over the years. Often I've given OPs replies that went beyond the stated requirements. I've been criticized for that. I wouldn't expect OPs to get answers that fit all possible uses, though. – hughdbrown Jan 02 '13 at 19:43
  • I only visited this answer at all because someone else pointed me at it on chat. While the OP didn't *specifically* mention threading, I think it's definitely worth mentioning when a static method *isn't* thread-safe, as it's unusual and makes the code unsuitable for many situations without modification. Your new code is thread-safe - but still not ideal as if you call it from multiple threads at "roughly" the same time to shuffle two collections of the same size, the shuffles will be equivalent. Basically, `Random` is a pain to use, as noted in my article. – Jon Skeet Jan 02 '13 at 19:48
  • @Jon Skeet: perhaps you should warn people not to pass in a static instance of Random to your function: `public static IEnumerable Shuffle(this IEnumerable source, Random rng)`. It could cause problems. – hughdbrown Jan 02 '13 at 19:52
  • There's no such thing as a "static instance" - but I'll add a link to the article, which is rather more useful IMO. – Jon Skeet Jan 02 '13 at 19:53
3

Seems like a good shuffling algorithm, if you're not too worried on the performance. The only problem I'd point out is that its behavior is not controllable, so you may have a hard time testing it.

One possible option is having a seed to be passed as a parameter to the random number generator (or the random generator as a parameter), so you can have more control and test it more easily.

Samuel Carrijo
  • 17,449
  • 12
  • 49
  • 59
3

Looking for an algorithm? You can use my ShuffleList class:

class ShuffleList<T> : List<T>
{
    public void Shuffle()
    {
        Random random = new Random();
        for (int count = Count; count > 0; count--)
        {
            int i = random.Next(count);
            Add(this[i]);
            RemoveAt(i);
        }
    }
}

Then, use it like this:

ShuffleList<int> list = new ShuffleList<int>();
// Add elements to your list.
list.Shuffle();

How does it work?

Let's take an initial sorted list of the 5 first integers: { 0, 1, 2, 3, 4 }.

The method starts by counting the nubmer of elements and calls it count. Then, with count decreasing on each step, it takes a random number between 0 and count and moves it to the end of the list.

In the following step-by-step example, the items that could be moved are italic, the selected item is bold:

0 1 2 3 4
0 1 2 3 4
0 1 2 4 3
0 1 2 4 3
1 2 4 3 0
1 2 4 3 0
1 2 3 0 4
1 2 3 0 4
2 3 0 4 1
2 3 0 4 1
3 0 4 1 2

SteeveDroz
  • 6,006
  • 6
  • 33
  • 65
3

I found Jon Skeet's answer to be entirely satisfactory, but my client's robo-scanner will report any instance of Random as a security flaw. So I swapped it out for System.Security.Cryptography.RNGCryptoServiceProvider. As a bonus, it fixes that thread-safety issue that was mentioned. On the other hand, RNGCryptoServiceProvider has been measured as 300x slower than using Random.

Usage:

using (var rng = new RNGCryptoServiceProvider())
{
    var data = new byte[4];
    yourCollection = yourCollection.Shuffle(rng, data);
}

Method:

/// <summary>
/// Shuffles the elements of a sequence randomly.
/// </summary>
/// <param name="source">A sequence of values to shuffle.</param>
/// <param name="rng">An instance of a random number generator.</param>
/// <param name="data">A placeholder to generate random bytes into.</param>
/// <returns>A sequence whose elements are shuffled randomly.</returns>
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, RNGCryptoServiceProvider rng, byte[] data)
{
    var elements = source.ToArray();
    for (int i = elements.Length - 1; i >= 0; i--)
    {
        rng.GetBytes(data);
        var swapIndex = BitConverter.ToUInt32(data, 0) % (i + 1);
        yield return elements[swapIndex];
        elements[swapIndex] = elements[i];
    }
}
frattaro
  • 3,241
  • 1
  • 16
  • 10
1

I would say that many answers here like "This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values" might be very wrong!

I'd think that this DOES NOT assign a random value to each element of the source collection. Instead there might be a sort algorithm running like Quicksort which would call a compare-function approximately n log n times. Some sort algortihm really expect this compare-function to be stable and always return the same result!

Couldn't it be that the IEnumerableSorter calls a compare function for each algorithm step of e.g. quicksort and each time calls the function x => r.Next() for both parameters without caching these!

In that case you might really mess up the sort algorithm and make it much worse than the expectations the algorithm is build up on. Of course, it eventually will become stable and return something.

I might check it later by putting debugging output inside a new "Next" function so see what happens. In Reflector I could not immediately find out how it works.

Christian
  • 2,903
  • 4
  • 31
  • 34
  • 1
    It's not the case: internal override void ComputeKeys(TElement[] elements, int count); Declaring Type: System.Linq.EnumerableSorter Assembly: System.Core, Version=3.5.0.0 This function creates an array first with all the keys which consumes memory, before quicksort sorts them – Christian Apr 26 '12 at 17:46
  • That's good to know - still just an implementation detail though, which could conceivably change in future versions! – Blorgbeard Sep 10 '12 at 22:37
1

This algorithm shuffles by generating a new random value for each value in a list, then ordering the list by those random values. Think of it as adding a new column to an in-memory table, then filling it with GUIDs, then sorting by that column. Looks like an efficient way to me (especially with the lambda sugar!)

Dave Swersky
  • 34,502
  • 9
  • 78
  • 118
1

Slightly unrelated, but here is an interesting method (that even though it is really excessibe, has REALLY been implemented) for truly random generation of dice rolls!

Dice-O-Matic

The reason I'm posting this here, is that he makes some interesting points about how his users reacted to the idea of using algorithms to shuffle, over actual dice. Of course, in the real world, such a solution is only for the really extreme ends of the spectrum where randomness has such an big impact and perhaps the impact affects money ;).

Community
  • 1
  • 1
Irfy
  • 2,469
  • 4
  • 25
  • 31
0

It is worth noting that due to the deferred execution of LINQ, using a random number generator instance with OrderBy() can result in a possibly unexpected behavior: The sorting does not happen until the collection is read. This means each time you read or enumerate the collection, the order changes. One would possibly expect the elements to be shuffled once and then to retain the order each time it is accessed thereafter.


Random random = new();
var shuffled = ordered.OrderBy(x => random.Next())

The code above passes a lambda function x => random.Next() as a parameter to OrderBy(). This will capture the instance referred to by random and save it with the lambda by so that it can call Next() on this instance to perform the ordering later which happens right before it is enumerated(when the first element is requested from the collection). The problem here, is since this execution is saved for later, the ordering happens each time just before the collection is enumerated using new numbers obtained by calling Next() on the same random instance.


Example

To demonstrate this behavior, I have used Visual Studio's C# Interactive Shell:

> List<int> list = new() { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
> Random random = new();
> var shuffled = list.OrderBy(element => random.Next());
> shuffled.ToList()
List<int>(10) { 5, 9, 10, 4, 6, 2, 8, 3, 1, 7 }   
> shuffled.ToList()
List<int>(10) { 8, 2, 9, 1, 3, 6, 5, 10, 4, 7 } // Different order
> shuffled.ElementAt(0) 
9                                              // First element is 9
> shuffled.ElementAt(0)
7                                              // First element is now 7
> 

This behavior can even be seen in action by placing a breakpoint just after where the IOrderedEnumerable is created when using Visual Studio's debugger: each time you hover on the variable, the elements show up in a different order.


This, of course, does not apply if you immediately enumerate the elements by calling ToList() or an equivalent. However, this behavior can lead to bugs in many cases, one of them being when the shuffled collection is expected to contain a unique element at each index.

Amal K
  • 4,359
  • 2
  • 22
  • 44
-5

Startup time to run on code with clear all threads and cache every new test,

First unsuccessful code. It runs on LINQPad. If you follow to test this code.

Stopwatch st = new Stopwatch();
st.Start();
var r = new Random();
List<string[]> list = new List<string[]>();
list.Add(new String[] {"1","X"});
list.Add(new String[] {"2","A"});
list.Add(new String[] {"3","B"});
list.Add(new String[] {"4","C"});
list.Add(new String[] {"5","D"});
list.Add(new String[] {"6","E"});

//list.OrderBy (l => r.Next()).Dump();
list.OrderBy (l => Guid.NewGuid()).Dump();
st.Stop();
Console.WriteLine(st.Elapsed.TotalMilliseconds);

list.OrderBy(x => r.Next()) uses 38.6528 ms

list.OrderBy(x => Guid.NewGuid()) uses 36.7634 ms (It's recommended from MSDN.)

the after second time both of them use in the same time.

EDIT: TEST CODE on Intel Core i7 4@2.1GHz, Ram 8 GB DDR3 @1600, HDD SATA 5200 rpm with [Data: www.dropbox.com/s/pbtmh5s9lw285kp/data]

using System;
using System.Runtime;
using System.Diagnostics;
using System.IO;
using System.Collections.Generic;
using System.Collections;
using System.Linq;
using System.Threading;

namespace Algorithm
{
    class Program
    {
        public static void Main(string[] args)
        {
            try {
                int i = 0;
                int limit = 10;
                var result = GetTestRandomSort(limit);
                foreach (var element in result) {
                    Console.WriteLine();
                    Console.WriteLine("time {0}: {1} ms", ++i, element);
                }
            } catch (Exception e) {
                Console.WriteLine(e.Message);
            } finally {
                Console.Write("Press any key to continue . . . ");
                Console.ReadKey(true);
            }
        }

        public static IEnumerable<double> GetTestRandomSort(int limit)
        {
            for (int i = 0; i < 5; i++) {
                string path = null, temp = null;
                Stopwatch st = null;
                StreamReader sr = null;
                int? count = null;
                List<string> list = null;
                Random r = null;

                GC.Collect();
                GC.WaitForPendingFinalizers();
                Thread.Sleep(5000);

                st = Stopwatch.StartNew();
                #region Import Input Data
                path = Environment.CurrentDirectory + "\\data";
                list = new List<string>();
                sr = new StreamReader(path);
                count = 0;
                while (count < limit && (temp = sr.ReadLine()) != null) {
//                  Console.WriteLine(temp);
                    list.Add(temp);
                    count++;
                }
                sr.Close();
                #endregion

//              Console.WriteLine("--------------Random--------------");
//              #region Sort by Random with OrderBy(random.Next())
//              r = new Random();
//              list = list.OrderBy(l => r.Next()).ToList();
//              #endregion

//              #region Sort by Random with OrderBy(Guid)
//              list = list.OrderBy(l => Guid.NewGuid()).ToList();
//              #endregion

//              #region Sort by Random with Parallel and OrderBy(random.Next())
//              r = new Random();
//              list = list.AsParallel().OrderBy(l => r.Next()).ToList();
//              #endregion

//              #region Sort by Random with Parallel OrderBy(Guid)
//              list = list.AsParallel().OrderBy(l => Guid.NewGuid()).ToList();
//              #endregion

//              #region Sort by Random with User-Defined Shuffle Method
//              r = new Random();
//              list = list.Shuffle(r).ToList();
//              #endregion

//              #region Sort by Random with Parallel User-Defined Shuffle Method
//              r = new Random();
//              list = list.AsParallel().Shuffle(r).ToList();
//              #endregion

                // Result
//              
                st.Stop();
                yield return st.Elapsed.TotalMilliseconds;
                foreach (var element in list) {
                Console.WriteLine(element);
            }
            }

        }
    }
}

Result Description: https://www.dropbox.com/s/9dw9wl259dfs04g/ResultDescription.PNG
Result Stat: https://www.dropbox.com/s/ewq5ybtsvesme4d/ResultStat.PNG

Conclusion:
Assume: LINQ OrderBy(r.Next()) and OrderBy(Guid.NewGuid()) are not worse than User-Defined Shuffle Method in First Solution.

Answer: They are contradiction.

GMzo
  • 89
  • 1
  • 4
  • 1
    The second option isn't *correct*, and therefore it's performance is *irrelevant*. This also still doesn't answer the question of whether ordering by a random number is acceptable, efficient, or how it works. The first solution also has correctness problems, but they're not *as* big of a deal. – Servy Feb 26 '14 at 17:48
  • Sorry, I'd like to know what is better kind of parameter of Quicksort of Linq OrderBy? I need to test performance. However, I think int type just have speed better than string of Guid but it is not. I understood why MSDN recommended. The first solution edited performance is as same as OrderBy with Random instance. – GMzo Feb 26 '14 at 18:16
  • What's the point of measuring the performance of code that doesn't solve the problem? Performance is only a consideration to make between two solutions *that both work*. When you have working solutions, *then* you can *start* to compare them. – Servy Feb 26 '14 at 18:40
  • I must have a time to test on more data then if it's finished, I promise to post again. Assume: I think Linq OrderBy is not worse than first solution. Opinion: It's easy to use and understand. – GMzo Feb 27 '14 at 17:41
  • It is noticeably less efficient than very simple straightforward shuffling algorithms, but once again, the performance is **irrelevant**. They are not reliably shuffling the data, in addition to being less performant. – Servy Feb 27 '14 at 18:09
  • I'd like to know how to measure performance for Big-O notation if we do not use time and big data about 100,000 records up. I don't mean my post is correct now. I'm testing on big data 200,000 records up for pre-processing data. This is my reason to find the best simple way, not complexity, to do random-sort data. – GMzo Feb 28 '14 at 05:29
  • Question: Is this a good shuffle algorithm? How does it work exactly? **Is it an acceptable way of doing this?** @Servy, Can you tell me what is measurement about **good** algorithm? Big-O notation is always mathematically consistent. In the real-world, it maybe the wrong model for me. – GMzo Feb 28 '14 at 05:48
  • Now, I'm proofing that is reasonable to use in the real-world? For question: Is it an acceptable way of doing this? Maybe it is yes or no. – GMzo Feb 28 '14 at 06:07
  • I'm not saying that you're incorrectly measuring your performance. I'm saying *the underlying code doesn't actually, successfully, and accurately, shuffle the data*. It doesn't matter if it's 10x faster than some other algorithm. If it doesn't actually *work* then it's speed is irrelevant. Apparently this is just too complex for you to understand. I probably shouldn't even be posting this; apparently you just aren't capable of understanding such a simple concept. – Servy Feb 28 '14 at 14:27
  • @Servy, please come back to test my code for checking actually and accurately shuffle data. Note: My data is generated by IBM Quest Data Generator. You can check more. The next step I will check on 1,000,000 records. – GMzo Feb 28 '14 at 18:51
  • It doesn't matter how many records you call it on. It doesn't matter how fast your code runs. All of the solutions in your answer *fail to reliably shuffle the data*. Your edit has added quite a few more methods that are broken in even worse ways than your other methods. – Servy Feb 28 '14 at 18:56
  • Your proposed solutions are still incorrect, yes. You testing them on a larger data set or making them even more wrong by improperly parallelizing them has not in any way fixed them. – Servy Feb 28 '14 at 19:08
  • I have to check that Microsoft said PLINQ is better than regular LINQ. – GMzo Feb 28 '14 at 19:41
  • First off, no. It's not just "better". It may or may not be preferable depending on the situation. In this case you're executing code that cannot be safely parallelized because there is unsynchronized shared state between the operations performed on each thread, thus resulting in undefined output. – Servy Feb 28 '14 at 19:43
  • I think speed in millisecond doesn't have any people to feel it slowlier than other shuffle method. If he chooses to use LINQ OrderBy(). OK. I see your suggestion. I will improve it when I finish my homework. – GMzo Feb 28 '14 at 19:47
  • Jon's answer is entirely correct. And for like the fifth time, I'm not saying that your code is slower. I'm saying that it doesn't produce valid output, and thus its speed is irrelevant. – Servy Feb 28 '14 at 19:48
  • I test more time to see in output. It's valid because the data may have item-id is unique. – GMzo Feb 28 '14 at 19:53