I am looking for an implementation of lazy shuffling in c#.
I only care about the time it takes to process the first couple of elements. I do not care whether or not the original list gets modified (i.e. removing elements would be fine). I do not care if the processing time gets longer as the iterator reaches the end of the list (as long as it stays within reasonable bounds of course).
Context: I have a large list, that I want to get a relatively small number of random samples from. In most cases I only need the very first random element, but in same rare cases I need all elements from the list.
If possible I would like to implement this as an extension method, like this (but answers without extension methods are fine too):
public static class Program
{
public static IEnumerable<T> lazy_shuffle<T>(this IEnumerable<T> input, Random r)
{
//do the magic
return input;
}
static void Main(string[] args)
{
var start = DateTime.Now;
var shuffled = Enumerable.Range(0, 1000000).lazy_shuffle(new Random(123));
var enumerate = shuffled.GetEnumerator();
foreach (var i in Enumerable.Range(0, 5))
{
enumerate.MoveNext();
Console.WriteLine(enumerate.Current);
}
Console.WriteLine($"time for shuffling 1000000 elements was {(DateTime.Now - start).TotalMilliseconds}ms");
}
}
Note:
input.OrderBy(i => r.Next())
would not be good enough, as it needs to iterate over the entire list once the generate one random number for each element of the list.- this is not a duplicate of Lazy Shuffle Algorithms because my question has less tight bounds for the algorithms but instead requires an implementation in c#
- this is not a duplicate of Randomize a List<T> because that question is about regular shuffling and not lazy shuffling.
update:
- A
Count
exists. Random Access to elements exists. It is not strictly an ienumerable, and instead just a bigList
orArray
. I have update the question to say "list" instead of "ienumerable". Only the output of the lazy-shuffler needs to be enumerable, the source can be an actual list. - The selection should be fair, i.e. each element needs to have the same chance to be picked first.
- mutation/modification of the source-list is fine
- In the end I only need to take N random elements from the list, but I do not know the N beforehand