47

Please suggest an easiest way to get a random shuffled collection of count 'n' from a collection having 'N' items. where n <= N

Jobi Joy
  • 49,102
  • 20
  • 108
  • 119

8 Answers8

109

Further to mquander's answer and Dan Blanchard's comment, here's a LINQ-friendly extension method that performs a Fisher-Yates-Durstenfeld shuffle:

// take n random items from yourCollection
var randomItems = yourCollection.Shuffle().Take(n);

// ...

public static class EnumerableExtensions
{
    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source)
    {
        return source.Shuffle(new Random());
    }

    public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
    {
        if (source == null) throw new ArgumentNullException(nameof(source));
        if (rng == null) throw new ArgumentNullException(nameof(rng));

        return source.ShuffleIterator(rng);
    }

    private static IEnumerable<T> ShuffleIterator<T>(
        this IEnumerable<T> source, Random rng)
    {
        var buffer = source.ToList();
        for (int i = 0; i < buffer.Count; i++)
        {
            int j = rng.Next(i, buffer.Count);
            yield return buffer[j];

            buffer[j] = buffer[i];
        }
    }
}
billw
  • 120
  • 1
  • 9
LukeH
  • 263,068
  • 57
  • 365
  • 409
  • Is it me or does this method screw up the list index? Whenever is use elementAt() after shuffling the result I get is totally unexpected... – Htbaa Jan 03 '12 at 11:45
  • 3
    @Htbaa: The method returns a lazily evaluated sequence. If you do `seq.Shuffle().ElementAt(n)` multiple times then you're re-shuffling each time so it's likely that you'll get a different item at position `n`. If you want to shuffle *once* then you need to store the sequence in a concrete collection of some kind: for example, `var list = seq.Shuffle().ToList()`. Then you can do `list.ElementAt(n)` as often as you like -- or just `list[n]` -- and you'll always get the same item back. – LukeH Jan 03 '12 at 11:52
  • Ah thanks! Will use that. Am still quite new to C# and LINQ, so I was a bit baffled by the result I was getting :-). – Htbaa Jan 03 '12 at 19:13
  • What is the benefit of having `.ShuffleIterator()` instead of putting that code in `.Shuffle()`? And since you call `.ToList()`, won't you have to iterate the entire collection rather than just the amount requested by `.Take(n)`? I have a hunch that the first question may answer my second question. – gilly3 Jul 24 '12 at 20:52
  • This provides unreliable results. I get a different element each time I call `myList.ElementAt(0)`. I get a different string each time I call `string.Join(",", myList)`. – gilly3 Jul 26 '12 at 19:09
  • 1
    @gilly3: That's probably because these `Shuffle` methods return a lazy `IEnumerable` which will be re-evaluated each time you use it in a `string.Join` or `ElementAt` call. That's just the nature of `IEnumerable`; if you want the behaviour of a concrete collection then you should actually create a concrete collection by calling `ToArray` or `ToList` on the result of the `Shuffle` call. – LukeH Jul 29 '12 at 23:04
  • Is using `.Shuffle()` thread safe? I don't see why it shouldn't be but I am getting strange results... multiple threads seem to pull the same shuffle results. – jocull Nov 27 '12 at 03:27
  • 1
    @gilly3 Sorry to bring this up a year later, but the point of ShuffleIterator is to let Shuffle perform argument validation up-front, rather than deferring it until MoveNext is first called on the enumerator (which may be never, or in code that is not expecting it). See http://msmvps.com/blogs/jon_skeet/archive/2008/03/02/c-4-idea-iterator-blocks-and-parameter-checking.aspx – Weeble Mar 27 '13 at 13:45
  • 2
    @jocull If you use the version that doesn't take an instance of Random, it will create one for you. Instances of Random created close in time have a risk of using the same seed, and thus giving the same sequence of random numbers. If you pass the same instance of Random to it on different threads, you will get unpredictable results because instances of Random are not thread-safe. You should create your Random instances with different seeds, and not share them between threads. – Weeble Mar 27 '13 at 13:52
41

Another option is to use OrderBy and to sort on a GUID value, which you can do so using:

var result = sequence.OrderBy(elem => Guid.NewGuid());

I did some empirical tests to convince myself that the above actually generates a random distribution (which it appears to do). You can see my results at Techniques for Randomly Reordering an Array.

Scott Mitchell
  • 8,659
  • 3
  • 55
  • 71
  • 19
    This solution violates the contract of orderby, specifically that a given object has a consistent key throughout the sort. If it does work, it does so through chance alone, and could break in future versions of the framework. For more details, see http://blogs.msdn.com/b/ericlippert/archive/2011/01/31/spot-the-defect-bad-comparisons-part-four.aspx – John Melville Apr 27 '11 at 23:22
  • Good catch. However, if this were committed to an actual collection -- such as adding .ToList() or .ToArray() to the end --, then there's no chance of the collection getting processed/iterating Linq code more than one time to perform the sort. I imagine that would survive future upgrades as well since it would act as a predictable snapshot. – MutantNinjaCodeMonkey Jan 09 '13 at 20:52
  • 5
    I just stumbled on this page and think it's brilliant, but I don't understand the comment. what could possibly go wrong in using this method to shuffle a collection (I'm not asking that sarcastically, I'm genuinely curious about the possibilities)? – SelAromDotNet Aug 08 '13 at 23:35
  • If you look at it from the point of view of those crazy functional programmers, this use of OrderBy doesn't violate the contract -- it's just that the consistent unique key is a function of all the seeds used to generate the GUID, so it's very hard to spot its consistency. For a certain machine, point in time, etc, you will always get back the same GUID. I know it stretches the idea of the contract a bit, but theoretically it's sound, I think. Maybe some FP people can comment further. – Paul d'Aoust Feb 18 '14 at 18:10
  • @Josh, there's no practical problems; it's just that, theoretically, it could be considered an abuse of OrderBy because you can't get repeatable results (well, not without a heck of a lot of work). The thing is, you don't want repeatable results in this case. – Paul d'Aoust Feb 18 '14 at 18:11
  • 18
    The problem here is not that the key is not consistent. John Melville has misread my article; that article notes that making a *comparison* -- that is, this item is bigger, smaller or equal to another -- that is inconsistent violates the contract of the Sort method. This answer is wrong for a completely different reason: **Guids are only guaranteed to be unique; their randomness is an implementation detail you should not rely upon**. – Eric Lippert Jun 19 '15 at 16:50
  • 12
    In particular, it is legal for guids to be generated *sequentially* from an initial random element; this still makes good guarantees of uniqueness. Just because the guid generator *today* does not actually generate sequential guids ever is an implementation detail subject to change, and if it does change, then suddenly your "shuffle" is "shuffling" things into sorted order every time. Use guids to generate *uniqueness*, never *randomness*. Use a class designed to generate randomness for randomness. – Eric Lippert Jun 19 '15 at 16:52
18

This has some issues with "random bias" and I am sure it's not optimal, this is another possibility:

var r = new Random();
l.OrderBy(x => r.NextDouble()).Take(n);
  • This doesn't have any random bias other than any insignificant bias from `Random` itself. It's also very efficient. – Enigmativity Nov 08 '17 at 22:44
6

Shuffle the collection into a random order and take the first n items from the result.

mqp
  • 70,359
  • 14
  • 95
  • 123
  • 6
    Note that you don't have to completely shuffle the collection when n < N. You simply have to iterate through Durstenfeld's algorithm n times, and take the last n items of the (partially) shuffled result. – Dan Blanchard Oct 30 '09 at 20:38
0

A bit less random, but efficient:

var rnd = new Random();
var toSkip = list.Count()-n;

if (toSkip > 0)
    toSkip = rnd.Next(toSkip);
else
    toSkip=0;

var randomlySelectedSequence = list.Skip(toSkip).Take(n);
Majid
  • 13,853
  • 15
  • 77
  • 113
0

Just needed to do that today. Here's my stab at it:

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable)
    => enumerable.Shuffle(new Random());
    
private static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable, Random random)
    => enumerable
       .Select(t => (t, r: random.Next())) // zip with random value
       .OrderBy(tr => tr.r)
       .Select(tr => tr.t);

The main idea here is the "zip" but without using Zip since we only want to iterate the enumerable once. Inside the sort, each element of the original enumerable has the same "value".

Daren Thomas
  • 67,947
  • 40
  • 154
  • 200
-1

I write this overrides method:

public static IEnumerable<T> Randomize<T>(this IEnumerable<T> items) where T : class
{
     int max = items.Count();
     var secuencia = Enumerable.Range(1, max).OrderBy(n => n * n * (new Random()).Next());

     return ListOrder<T>(items, secuencia.ToArray());
}

private static IEnumerable<T> ListOrder<T>(IEnumerable<T> items, int[] secuencia) where T : class
        {
            List<T> newList = new List<T>();
            int count = 0;
            foreach (var seed in count > 0 ? secuencia.Skip(1) : secuencia.Skip(0))
            {
                newList.Add(items.ElementAt(seed - 1));
                count++;
            }
            return newList.AsEnumerable<T>();
        }

Then, I have my source list (all items)

var listSource = p.Session.QueryOver<Listado>(() => pl)
                        .Where(...);

Finally, I call "Randomize" and I get a random sub-collection of items, in my case, 5 items:

var SubCollection = Randomize(listSource.List()).Take(5).ToList();
Hernaldo Gonzalez
  • 1,977
  • 1
  • 21
  • 32
  • You are wasting memory by creating `newList`. Maybe consider using `yield return` inside the `foreach` statement. – bradlis7 Jun 16 '15 at 21:41
  • 1
    The line `Enumerable.Range(1, max).OrderBy(n => n * n * (new Random()).Next())` is awful - it's not random at all. calling `new Random()).Next()` like that will just return the same number in most cases and the `n * n` makes the number biased - and possibly could cause overflow exceptions. – Enigmativity Nov 08 '17 at 23:01
-3

Sorry for ugly code :-), but


var result =yourCollection.OrderBy(p => (p.GetHashCode().ToString() + Guid.NewGuid().ToString()).GetHashCode()).Take(n);

sh1ng
  • 2,808
  • 4
  • 24
  • 38
  • `p.GetHashCode().ToString() + Guid.NewGuid().ToString()).GetHashCode()` isn't random. It might look random, but it's not. You should use a RNG for randomness - data scientists have designed them specifically to be as random as possible. – Enigmativity Nov 08 '17 at 23:03