Is there a performance difference between these two algorithms for shuffling an IEnumerable?

Question

These two questions give similar algorthims for shuffling an IEnumerable:

Here are the two methods side-by-side:

public static IEnumerable<T> Shuffle1<T> (this IEnumerable<T> source)
{
    Random random = new Random ();
    T [] copy = source.ToArray ();

    for (int i = copy.Length - 1; i >= 0; i--) {
        int index = random.Next (i + 1);
        yield return copy [index];
        copy [index] = copy [i];
    }
}


public static IEnumerable<T> Shuffle2<T> (this IEnumerable<T> source)
{
    Random random = new Random ();
    List<T> copy = source.ToList ();

    while (copy.Count > 0) {
        int index = random.Next (copy.Count);
        yield return copy [index];
        copy.RemoveAt (index);
    }
}

They are basically identical, except one uses a List, and one uses an array. Conceptually, the second one seems more clear to me. But is there a substantial performance benefit to be gained from using an array? Even if the Big-O time is the same, if it is several times faster, it could make a noticeable difference.

You have written the code both ways. If you want to know which one is faster *run the code* and then you'll know. It is a mystery to me why so many people post "which one of these is faster?" questions. Surely it has got to be both faster and more accurate to simply run the code both ways while sitting at your desk than to ask the internet and wait for someone else to either run it for you, or make a guess. — Eric Lippert, Dec 10 '10 at 20:37
@Matthew always post your own benchmark and then ask abt its reason better. — nawfal, May 30 '13 at 13:46

score 7 · Accepted Answer · answered Dec 10 '10 at 19:19

7

The second version will probably be a bit slower because of RemoveAt. Lists are really arrays that grow when you add elements to them, and as such, insertion and removal in the middle is slow (in fact, MSDN states that RemoveAt has an O(n) complexity).

Anyway, the best would be to simply use a profiler to compare both methods.

answered Dec 10 '10 at 19:19

Etienne de Martel

34,692
8
91
111

2

+1 answered : I would add that `List.RemoveAt` is O(n) where n is Count - index meaning that it takes more time to remove from the beginning than the end. It has to move every value after index. There's no need to profile this, the second one can only be slower. The first is a "Fisher-Yates Shuffle" and is highly recognizable to anyone who has *Data Structures 101* knowledge. – Tergiver Dec 10 '10 at 19:31
2

Running both in Linqpad which gives you a timer count, using a list of 300k integers, the first one took on average 0.1 sec, the second took 22.5 sec (I am running Windows XP on 3 GHz Pentium D with 2GB RAM). I should add the timing included the initializing my list and I used a foreach loop that took each element from the shuffle and placed it in a HashSet – pstrjds Dec 10 '10 at 19:43

Greg Beech · Answer 2 · 2010-12-10T20:12:42.013

2

The first algorithm is O(n) as it has a loop which performs an O(1) swap on each iteration. The second algorithm is O(n^2) as it performs an O(n) RemoveAt operation on each iteration. In addition indexers on lists are slower than indexes on arrays because the former is a method call whereas the latter is an IL instruction.

So out of the two the first one is likely to be faster. That said, if you're after performance, why bother yielding the results? It's already converting to an array so just shuffle that in place and return the array directly (or wrapped in a ReadOnlyCollection<T> if you're worried about people changing it) which is probably faster still.

On a side note, both methods have bugs that the behaviour of Random when used by multiple threads is undefined, so they should probably use a thread-safe random number generator.

edited Dec 10 '10 at 20:12

answered Dec 10 '10 at 19:26

Greg Beech

133,383
43
204
250

1

Yielding is good if you might not want all the elements. Say you want the first 3 random elements from a collection of 1000. This will be much faster than shuffling all 1000 elements, then picking the first 3. – Matthew Dec 10 '10 at 19:33
2

@Matthew - true, but if performance is a concern then I would make two methods; one which is fast at a complete shuffle (Shuffle) and one optimised for taking a random sample (Sample or TakeRandom). I also think that's probably a bit more self-documenting as people won't necessarily expect a shuffle method to be lazy. – Greg Beech Dec 10 '10 at 19:41
FYI, it is most certainly O(N^2). As the list shrinks, `RemoveAt` will have to operate on an average of N/2 items per iteration, so the runtime of `RemoveAt` is still directly proportional to N. – mqp Dec 10 '10 at 19:47
@mquander - Yes, you're right. It's been a long day. Corrected my answer. – Greg Beech Dec 10 '10 at 20:13

mqp · Answer 3 · 2010-12-10T19:44:00.083

2

The first doesn't compile, although it's apparent that you're trying to reify the enumerable, and then implement Fisher-Yates; that's probably the correct approach, and it shouldn't be unclear to anyone who has ever shuffled an array before. The second using RemoveAt is bad for the reasons stated by other commenters.

EDIT: Your top implementation looks like it's correct now, and it's a good way to do it.

edited Dec 10 '10 at 19:44

answered Dec 10 '10 at 19:35

mqp

70,359
14
95
123

1

Oops, fixed the compilation error. I changed the variable names from the two questions that I linked to so that they would match each other as closely as possible, but I missed one. – Matthew Dec 10 '10 at 19:37

Is there a performance difference between these two algorithms for shuffling an IEnumerable?

3 Answers3

Linked