2

This is my code for order the first N element randomly in a List :

int upper = 1;
if (myList.Count > 1)
{
    Random r = new Random();
    upper = Math.Min(maxNumberOfPackets, myList.Count);
    for (int i = 0; i < upper; i++)
    {
        int randInd = r.Next(i, myList.Count);
        var temp = myList[i];
        myList[i] = myList[randInd];
        myList[randInd] = temp;
    }
}

well, now I have the "necessity" to use only Enumerable (for some reason; so I don't want to convert it in a Enumerable).

For your knowledge, can I do the same with Enumerable? I think the pain is access to element at the X position...

I'm so curious...

markzzz
  • 47,390
  • 120
  • 299
  • 507

6 Answers6

6

With IEnumerable[<T>] sources, you should usually only assume they can be read once, and of course you can't reliably know the length until you have consumed it. Randomising it is going to typically require buffering. You might as well just do:

var list = source.ToList();

and just use your existing list-randomising code. You can of course then return that list as IEnumerable[<T>] again if you want to keep the input and output similar.

If you really wanted to, you could scramble just the start:

static IEnumerable<T> ScrambleStart(this IEnumerable<T> source,
        int numberToScramble)
{
  using(var iter = source.GetEnumerator()) {
    List<T> list = new List<T>();
    while(iter.MoveNext()) {
        list.Add(iter.Current);
        if(list.Count == numberToScramble) break;
    }
    ScrambleList(list); // your existing code
    foreach(var item in list) yield return item;
    while(iter.MoveNext()) {
        yield return iter.Current;
    }
  }
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Can you explain to me what this code do? It's the same of .ToList() :O – markzzz Jun 13 '12 at 15:00
  • @markzzz no, it is very different to ToList()... It buffers only as much data as you want to randomise - then after that, it is a direct non-buffered deferred iterator. Meaning: a sequence of 2M items - you ask for 5 randomised, it only buffers 5. The rest is entirely non-buffered via the magic of iterator blocks. – Marc Gravell Jun 13 '12 at 15:22
  • But if I buffer 5 elements, I can randomise only on these 5 elements. I need to choose 5 elements from the whole collection :O – markzzz Jun 13 '12 at 15:43
  • @markzzz your question states "the first N element randomly". If you causally intended "pick N elements, at random, from the list: then the "how" would (IMO) depend on "how big is the N". For example, you could pick the *indices* at the start, then walk through once to get the values, then scramble the items. – Marc Gravell Jun 13 '12 at 15:52
  • Yeah, maybe I've express bad myself with my english! I mean, if I have 200 elements, take 5 randomly! So, order randomly only the first 5 (taken them from the whole list) and take them :) This means that I don't need to randomly order the whole list :) – markzzz Jun 13 '12 at 15:59
3

IEnumerable represents a 'forward only cursor', which may return data that is streamed (from a database for instance). So to be able to order an enumerable (no matter whether it is random or not), does mean you need to cache it, simply because you need to have all values to be able to determine the ordering. That said, you can use the Enumerable.OrderBy operator for this, but again, it will do caching for you under the covers.

var r = new Random();

IEnumerable sorted = myList.OrderBy(i => r.Next());
Steven
  • 166,672
  • 24
  • 332
  • 435
2

You should use Reservoir sampling.

Vlad
  • 35,022
  • 6
  • 77
  • 199
1

You always have to consume your IEnumerable to randomize it in some way, and thus you can just call .ToList() on it, and use your sorting method.

Consider the fact that an IEnumerable can have an infinite number of items.

sloth
  • 99,095
  • 21
  • 171
  • 219
1

The below implementation will yield a (pseudo) random sequence from the original input. (aside from Random being pseudo random) the pseudo-ness is based on guessing the length of the sequence, which of course in most cases will be wrong. It will be random for sequences with a length less than the maxJump length. If it turns out there's half as many elements or more than the value of maxJump it will be increased to allow for are higher degree of random-ness.

public IEnumerable<T> Randomize(this IEnumerable<T> source,int maxJump = 1000){
    var cache = new List<T>();
    var r = new Random();
    var enumerator = source.GetEnumerator();
    var totalCount = 1;
    while(enumerator.MoveNext()){
       var next = r.Next(0,maxJump);
       if(next < cache.Count){
          //the element we are searching for is in the cache
          yield return cache[next];
          cache.RemoveAt(next);
       } else {
         next = next - cache.Count;
         do{
          totalCount++;
          if(next == 0){
             //we've found the next element so yield
             yield return enumerator.Current;
          } else {
             //not the element we are looking for so cache
             cache.Insert(r.Next(0,cache.Count),enumerator.Current);
          }
          --next;
        }while(next > 0 && enumerator.MoveNext());
        //if we've reached half of the maxJump length
        //increase the max to allow for higher randomness
        if("*totalCount == maxJump){
            maxJump *= 2;
        }
      }
    }
    //Yield the elements in the cache
    //they are already randomized
    foreach(var elem in cache){
          yield return elem;
    }
}
Rune FS
  • 21,497
  • 7
  • 62
  • 96
0

This works, but it is extremely inefficient for large sequences from which you want to randomly select a relatively small number of elements (since it iterates through every single element of the source sequence).

/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>

public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, System.Random rng)
{
    if (sequence == null)
    {
        throw new ArgumentNullException("sequence");
    }

    if (count < 0 || count > sequenceLength)
    {
        throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
    }

    if (rng == null)
    {
        throw new ArgumentNullException("rng");
    }

    int available = sequenceLength;
    int remaining = count;
    var iterator  = sequence.GetEnumerator();

    for (int current = 0; current < sequenceLength; ++current)
    {
        iterator.MoveNext();

        if (rng.NextDouble() < remaining/(double)available)
        {
            yield return iterator.Current;
            --remaining;
        }

        --available;
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276