2

I am looking for an implementation of lazy shuffling in c#.

I only care about the time it takes to process the first couple of elements. I do not care whether or not the original list gets modified (i.e. removing elements would be fine). I do not care if the processing time gets longer as the iterator reaches the end of the list (as long as it stays within reasonable bounds of course).

Context: I have a large list, that I want to get a relatively small number of random samples from. In most cases I only need the very first random element, but in same rare cases I need all elements from the list.

If possible I would like to implement this as an extension method, like this (but answers without extension methods are fine too):

public static class Program
{
    public static IEnumerable<T> lazy_shuffle<T>(this IEnumerable<T> input, Random r)
    {
        //do the magic
        return input;
    }
    static void Main(string[] args)
    {
        var start = DateTime.Now;
        var shuffled = Enumerable.Range(0, 1000000).lazy_shuffle(new Random(123));
        var enumerate = shuffled.GetEnumerator();
        foreach (var i in Enumerable.Range(0, 5))
        {
            enumerate.MoveNext();
            Console.WriteLine(enumerate.Current);
        }
        Console.WriteLine($"time for shuffling 1000000 elements was {(DateTime.Now - start).TotalMilliseconds}ms");
    }
}

Note:

  • input.OrderBy(i => r.Next()) would not be good enough, as it needs to iterate over the entire list once the generate one random number for each element of the list.
  • this is not a duplicate of Lazy Shuffle Algorithms because my question has less tight bounds for the algorithms but instead requires an implementation in c#
  • this is not a duplicate of Randomize a List<T> because that question is about regular shuffling and not lazy shuffling.

update:

  • A Count exists. Random Access to elements exists. It is not strictly an ienumerable, and instead just a big List or Array. I have update the question to say "list" instead of "ienumerable". Only the output of the lazy-shuffler needs to be enumerable, the source can be an actual list.
  • The selection should be fair, i.e. each element needs to have the same chance to be picked first.
  • mutation/modification of the source-list is fine
  • In the end I only need to take N random elements from the list, but I do not know the N beforehand
wotanii
  • 2,470
  • 20
  • 38
  • 2
    Do you have a collection where you can obtain `Count`? Or do you *just* have `IEnumerable`? I ask because I'm not sure you can do it with just IEnumerable. – Lasse V. Karlsen Jul 14 '21 at 09:02
  • 1
    Does it actually need to be shuffled, or do you just want to take N random values from the collection? – Matthew Watson Jul 14 '21 at 09:04
  • 4
    How *fair* does the shuffling need to be? If every element must have an equal chance of being selected first it's fairly trivial to see that the entire input enumeration must be consumed. – Damien_The_Unbeliever Jul 14 '21 at 09:09
  • 1
    How would you know how to pick first shuffled item, if you don't know which is the last unshuffled item, or even how many items there are? – Dialecticus Jul 14 '21 at 09:16
  • I do not think there is any fair way to do this faster than `O(n)` where n is the total number of items. At least not unless you have a list or array you are allowed to mutate. – JonasH Jul 14 '21 at 09:26
  • thanks for the feedback. I have added an "update" to the question to incorporate your feedback. – wotanii Jul 14 '21 at 09:45
  • 1
    You have some weird code in there. Subtracting `DateTime` to measure the duration of time is not at all accurate. Using an Enumerator in a `for` loop like that is odd. – Enigmativity Jul 14 '21 at 10:46
  • hehe, yes. I guess it shows that I spent a little too much time writing python lol – wotanii Jul 14 '21 at 13:14

1 Answers1

2

Since the original list can be modified, here is a very simple and efficient solution, based on this answer:

public static IEnumerable<T> Shuffle<T>(this IList<T> list, Random rng)
{
    for(int i = list.Count - 1; i >= 0; i--)
    {
        int swapIndex = rng.Next(i + 1);
        yield return list[swapIndex];
        list[swapIndex] = list[i];
    }
}
Olivier
  • 13,283
  • 1
  • 8
  • 24