29

I need an extension method which will shuffle an IEnumerable<T>. It can also take an int to specify the size of the returned IEnumerable. Better keeping Immutability of the IEnumerable. My current solution for IList-

public static IList<T> Shuffle<T>(this IList<T> list, int size)
{
    Random rnd = new Random();
    var res = new T[size];

    res[0] = list[0];
    for (int i = 1; i < size; i++)
    {
        int j = rnd.Next(i);
        res[i] = res[j];
        res[j] = list[i];
    }
    return res;
}

public static IList<T> Shuffle<T>(this IList<T> list)
{ return list.Shuffle(list.Count); }
AakashM
  • 62,551
  • 17
  • 151
  • 186
Gulshan
  • 3,611
  • 5
  • 35
  • 46
  • 1
    note that in order for `< >` to appear they generally *must* be formatted as code, either inline with backquotes (as I hvae added) or (as you have done) with four-space indentation – AakashM Apr 27 '11 at 16:12

3 Answers3

59

You can use a Fisher-Yates-Durstenfeld shuffle. There's no need to explicitly pass a size argument to the method itself, you can simply tack on a call to Take if you don't need the entire sequence:

var shuffled = originalSequence.Shuffle().Take(5);

// ...

public static class EnumerableExtensions
{
    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
    {
        return source.Shuffle(new Random());
    }

    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
    {
        if (source == null) throw new ArgumentNullException(nameof(source));
        if (rng == null) throw new ArgumentNullException(nameof(rng));

        return source.ShuffleIterator(rng);
    }

    private static IEnumerable<T> ShuffleIterator<T>(
        this IEnumerable<T> source, Random rng)
    {
        var buffer = source.ToList();
        for (int i = 0; i < buffer.Count; i++)
        {
            int j = rng.Next(i, buffer.Count);
            yield return buffer[j];

            buffer[j] = buffer[i];
        }
    }
}
Kevin R.
  • 3,571
  • 1
  • 19
  • 29
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • 2
    Is the purpose of 2nd method to just throw exception? – Gulshan Apr 28 '11 at 05:08
  • 9
    Yes, it's so that the argument checking is done eagerly rather than being deferred. If the second and third methods were rolled together then any argument checking wouldn't take place until you started iterating the sequence. – LukeH Apr 28 '11 at 09:04
  • Luke, when you are calling source.ToList() in your main method, doesn't that imply the IEnumerable is executed (probably if its a Linq enumerable, then you are breaking their deferred execution? Better to ask for an IList! – nawfal Nov 25 '12 at 15:13
  • @nawfal: Yes, the method has to buffer the contents of the source `IEnumerable<>` so that it can perform the shuffle. It then yields its *output* lazily. I'm not sure what you mean about asking for an `IList`, or how that would help. – LukeH Nov 26 '12 at 00:37
  • @LukeH Let's assume `source` in your case is `var d = p.Select(...).OrderBy(...)`, when you do a ToList from inside the method, its executing the above query. If you are taking an `IList` instead as the parameter, the execution (by doing a `ToList` or `ToArray` for `d`) will have to be done from the caller side, so that gives no confusion about side effects of Shuffle method. I hope its clear – nawfal Nov 26 '12 at 00:43
  • @nawfal: I see. Doing that would go against the LINQ philosophy of no side-effects. Taking an `IEnumerable<>` sequence and returning a new, shuffled sequence is preferable, imo, to taking an `IList<>` and mutating it. – LukeH Nov 26 '12 at 01:07
  • @nawfal: It's easy enough to write a non-LINQy method to do that: something like `public static void ShuffleInPlace(this IList source)` – LukeH Nov 26 '12 at 01:09
  • @LukeH I'm not saying shuffle should take place on the same enumerable. In fact it should return a new shuffled sequence. but since the shuffle method executes the possible query that is an `IEnumerable` `source`, its against the `Linq`'s deferred execution. So you may not get the best of Linq's lazy evaluation. I'm just saying if ever this Shuffle method is used with Linq, it behaves unlike all typical Linq methods. If you declare it IList, you are giving the caller the know-how of if the expression will be executed or not. – nawfal Nov 26 '12 at 06:56
  • I'm not saying the above method is bad, but just that it may not be that good to be used along with Linq. Just thinking. I made it a [separate question here](http://stackoverflow.com/questions/13556837/is-it-safe-to-accept-ienumerablet-as-argument-after-the-introduction-of-linq) only to get down votes :) – nawfal Nov 26 '12 at 06:57
  • 7
    @nawfal: Many built-in LINQ methods buffer-up the entire sequence internally and then yield results lazily: for example, `GroupBy`, `OrderBy`, `OrderByDescending`, `ThenBy`, `ThenByDescending`, `Reverse` etc all need to buffer their source sequence; `Except`, `GroupJoin`, `Intersect`, `Join` etc all buffer their "secondary" input sequence. It's not a problem, imo, although it is a good idea to clearly document whether or not a method needs to buffer the entire sequence internally. – LukeH Nov 26 '12 at 10:09
  • No need for the second method to exist – Dan Hunex Jun 05 '18 at 19:20
1

With some LINQ love:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> list, int size)
{
    var r = new Random();
    var shuffledList = 
        list.
            Select(x => new { Number = r.Next(), Item = x }).
            OrderBy(x => x.Number).
            Select(x => x.Item).
            Take(size); // Assume first @size items is fine

    return shuffledList.ToList();
}
Anton Gogolev
  • 113,561
  • 39
  • 200
  • 288
  • 1
    Isn't it same as `OrderBy(x => r.Next())`? – Gulshan Apr 27 '11 at 16:12
  • 1
    @Gulshan: Yes it is. Sorting by a random number is not considered to be a very good shuffling algorithm. It works, but it's not as random as it could be. – Gabe Apr 27 '11 at 16:41
  • No, it's not the same. `OrderBy(x => r.Next())` can potentially put the sort into an infinite loop. This can't. – Jim Mischel Apr 27 '11 at 23:48
  • 3
    @Jim: The `OrderBy` method actually does something like this internally anyway -- generates a key for each item *once* using the projection, stores it and then uses that stored key to sort -- so there's no danger of an infinite loop with the current MS implementation. (Although there's no guarantee that the implementation will be the same on different platforms/versions.) – LukeH Apr 28 '11 at 00:07
  • @LukeH: Can you give me a pointer to more info on that? What you said makes no sense to me, especially in the context of a comparison function (which is what `r.Next` is being used as here). What am I missing? – Jim Mischel Apr 28 '11 at 00:13
  • @Jim: Since it's just an implementation detail there's no official documentation, but there's a good discussion of it on Jon Skeet's blog: https://msmvps.com/blogs/jon_skeet/archive/2011/02/23/reimplementing-linq-to-objects-part-45-conclusion-and-list-of-posts.aspx (see the four articles that comprise the "Ordering" section, and their comments). – LukeH Apr 28 '11 at 00:46
  • @LukeH: Thanks. I realized on my way home that I was thinking that the parameter was a comparison delegate, not a selection delegate. It's been a bad week. – Jim Mischel Apr 28 '11 at 01:04
0

Anton's got the idea, but you could make it a two-liner:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable)
{
    var r = new Random();
    return enumerable.OrderBy(x=>r.Next()).ToList();
}

Unfortunately, it can't be lazily evaluated because r will be out of scope when it does execute. You can create an IEnumerable implementation that encapsulates this code and return that, but that gets more complex.

KeithS
  • 70,210
  • 21
  • 112
  • 164
  • Heard this is kind of biased. – Gulshan Apr 27 '11 at 16:19
  • @Gulshan: It shouldn't be *too* bad, although the fact that `OrderBy` is a stable sort could cause problems (and it's possible that `Random` itself might have some inherent bias). The main potential drawback with using `OrderBy` is that it has O(n lg n) time complexity; it's possible to perform a shuffle in O(n), and with no bias. – LukeH Apr 27 '11 at 16:28
  • This is a really, *really* bad idea. It's possible for something like this to cause an infinite loop in the sorting method. – Jim Mischel Apr 27 '11 at 23:36
  • 2
    @Jim: The current CLR implementation of `OrderBy` only executes the projection once for each element, stores the key and then uses that stored key for ordering, so there's currently no danger of an infinite loop. (Of course, this is relying on an implementation detail that could, theoretically, change.) – LukeH Apr 28 '11 at 00:09
  • @LukeH: Right. This can't cause an infinite loop. I was thinking of a comparison delegate rather than a selector. – Jim Mischel Apr 28 '11 at 01:05
  • Why are you `ToList()` -ing when you are returning just IEnumerable? That can create side effects! – nawfal Nov 25 '12 at 15:15
  • Because if you return just the OrderBy iterator, r will go out of scope and be destroyed before it's ever used. The primary side effect is additional memory; this does not affect the initial enumerable. – KeithS Nov 26 '12 at 16:09
  • 3
    What's up with this "r will go out of scope"? What about variable capturing? It should not be a problem omitting the `.ToList()` in this code snippet... – Dennis Apr 10 '14 at 06:14
  • 4
    -1 This is a managed language, `r` is allocated on the heap and will never "go out of scope", the GC will ensure `r` does not get garbage collected until it is no longer referenced. – Lukazoid Oct 30 '14 at 11:28