6

I have an IQueryable containing more than 300 objects:

public class Detail
{
    public int Id { get; set; }
    public int CityId { get; set; }
    public bool Chosen { get; set; }
}

IQueryable<Detail> details = ...

How can I go against this an at random pick out 50 objects? I assume that I would need to convert this with .ToList() but I am not sure how I could pick out random elements.

  • 3
    possible duplicate of [Access random item in list](http://stackoverflow.com/questions/2019417/access-random-item-in-list) – User 12345678 Nov 07 '13 at 12:51
  • 1
    That question does not address picking 50 items. – H H Nov 07 '13 at 12:54
  • 1
    Then it's a possible duplicate of http://stackoverflow.com/questions/19398339/get-x-random-elements-from-table-in-database-using-linq-or-lamda-in-c-sharp/19398458#19398458 – Matten Nov 07 '13 at 12:57
  • But I am not sure how to pick the random items which is why I posed the question. –  Nov 07 '13 at 12:59

6 Answers6

11

300 is not very much, so Yes, make it a List:

IQueryable<Detail> details = ...
IList<Detail> detailList = details.ToList();

And now you can pick a random item :

var randomItem = detailList[rand.Next(detailList.Count)];

and you could repeat that 50 times. That would however lead to duplicates, and the process to eliminate them would become messy.

So use a standard shuffle algorithm and then pick the first 50 :

Shuffle(detailList);
var selection = detailList.Take(50);
Community
  • 1
  • 1
H H
  • 263,252
  • 30
  • 330
  • 514
  • What an interesting approach with the Shuffle algorithm. I've always continued to iterate until I got 50 unique items, but that's clearly a much worse algorithm. Thanks a lot Henk! – Mike Perrenoud Nov 07 '13 at 12:58
4

If you know in advance the total number of items from which to randomly pick, you can do it without converting to a list first.

The following method will do it for you:

/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>

public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, System.Random rng)
{
    if (sequence == null)
    {
        throw new ArgumentNullException("sequence");
    }

    if (count < 0 || count > sequenceLength)
    {
        throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
    }

    if (rng == null)
    {
        throw new ArgumentNullException("rng");
    }

    int available = sequenceLength;
    int remaining = count;
    var iterator  = sequence.GetEnumerator();

    for (int current = 0; current < sequenceLength; ++current)
    {
        iterator.MoveNext();

        if (rng.NextDouble() < remaining/(double)available)
        {
            yield return iterator.Current;
            --remaining;
        }

        --available;
    }
}

(The key thing here is needing to know at the start the number of items to choose from; this does reduce the utility somewhat. But if getting the count is quick and buffering all the items would take too much memory, this is a useful solution.)


Here's another approach which uses Reservoir sampling

This approach DOES NOT need to know the total number of items to choose from, but it does need to buffer the output. Of course, it also needs to enumerate the entire input collection.

Therefore this is really only of use when you don't know in advance the number of items to choose from (or the number of items to choose from is very large).

I would recommend just shuffling a list as per Henk's answer rather than doing it this way, but I'm including it here for the sake of interest:

// n is the number of items to randomly choose.

public static List<T> RandomlyChooseItems<T>(IEnumerable<T> items, int n, Random rng)
{
    var result = new List<T>(n);
    int index = 0;

    foreach (var item in items)
    {
        if (index < n)
        {
            result.Add(item);
        }
        else
        {
            int r = rng.Next(0, index + 1);

            if (r < n)
                result[r] = item;
        }

        ++index;
    }

    return result;
}

As an addendum to Henk's answer, here's a canonical implementation of the Shuffle algorithm he mentions. In this, _rng is an instance of Random:

/// <summary>Shuffles the specified array.</summary>
/// <typeparam name="T">The type of the array elements.</typeparam>
/// <param name="array">The array to shuffle.</param>

public void Shuffle<T>(IList<T> array)
{
    for (int n = array.Count; n > 1;)
    {
        int k = _rng.Next(n);
        --n;
        T temp = array[n];
        array[n] = array[k];
        array[k] = temp;
    }
}
Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • I don't think that will give an even spread of random items (which is not explicitly stated but can probably be assumed). To see this consider the first item which can only be picked as the first of the random items and if it isn't it is never again a candidate. – Chris Nov 07 '13 at 12:59
  • No, it really will give an even spread. This is a well-understood algorithm. – Matthew Watson Nov 07 '13 at 13:00
  • Ah yes. I was thinking that the random number it was comparing to was calculated once (so if it was 0.5 it would skip the first half of the list) but in fact it generates a new one each time. Yup. I see how that works now. – Chris Nov 07 '13 at 13:03
3
Random rnd = new Random();
IQueryable<Detail> details = myList.OrderBy(x => rnd.Next()).Take(50);
Rik
  • 28,507
  • 14
  • 48
  • 67
1
IQueryable<Detail> details = myList.OrderBy(x => Guid.NewGuid()).ToList();

After this just walk trough it linearly:

var item1 = details[0];

This will avoid duplicates.

Jeroen Vannevel
  • 43,651
  • 22
  • 107
  • 170
  • 1
    Yeah, the `Collections.Shuffle`, you'll want to link to that algorithm or place yours inline. – Mike Perrenoud Nov 07 '13 at 12:59
  • @neoistheone: heh, confused with Java libraries. Henk already got the algorithms linked, that'll suffice. The `Guid` approach is definitely the most simple. – Jeroen Vannevel Nov 07 '13 at 13:00
  • You know, I wondered about that because when I Google'd it I turned up a Java library! – Mike Perrenoud Nov 07 '13 at 13:01
  • It's a shame this hasn't been built into the C# libraries yet. You'd think it's a fairly basic collection utility – Jeroen Vannevel Nov 07 '13 at 13:02
  • I have to agree with you on this one now that I've seen the shuffle algorithm because it makes this an O(n) operation with that. I suppose technically you'll iterate an additional 50 times, but that wouldn't change the notation if I remember correctly. – Mike Perrenoud Nov 07 '13 at 13:04
1
var l = new List<string>();
l.Add("A");
l.Add("B");
l.Add("C");
l.Add("D");
l.Add("E");
l.Add("F");
l.Add("G");
l.Add("H");
l.Add("I");

var random = new Random();
var nl = l.Select(i=> new {Value=i,Index = random.Next()});

var finalList = nl.OrderBy(i=>i.Index).Take(3);
foreach(var i in finalList)
{
    Console.WriteLine(i.Value);
}
Michael Mairegger
  • 6,833
  • 28
  • 41
0

This is what ended up working for me, it ensures no duplicates are returned:

public List<T> GetRandomItems(List<T> items, int count = 3)
{
    var length = items.Count();
    var list = new List<T>();
    var rnd = new Random();
    var seed = 0;

    while (list.Count() < count)
    {
        seed = rnd.Next(0, length);
        if(!list.Contains(items[seed]))
            list.Add(items[seed]);
    }

    return list;
}
Canica
  • 2,650
  • 3
  • 18
  • 34