3

I've a method which returns a generic list collection(List) from the database. This collection has got order details i.e., Order Id, order name, product details etc.

Also, method the method returns a collection having only the top 5 orders sorted by order date descending.

My requirement is that each time the client calls this method, I need to return collection which has got 5 random orders.

How do I achieve this using C#?

Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
Ed.
  • 1,654
  • 7
  • 20
  • 33

4 Answers4

13

I wrote a TakeRandom extension method a while back which does this using a Fisher-Yates shuffle. It's pretty efficient as it only bothers to randomise the number of items that you actually want to return, and is guaranteed to be unbiased.

public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> source, int count)
{
    var array = source.ToArray();
    return ShuffleInternal(array, Math.Min(count, array.Length)).Take(count);
}

private static IEnumerable<T> ShuffleInternal<T>(T[] array, int count)
{
    for (var n = 0; n < count; n++)
    {
        var k = ThreadSafeRandom.Next(n, array.Length);
        var temp = array[n];
        array[n] = array[k];
        array[k] = temp;
    }

    return array;
}

An implementation of ThreadSafeRandom can be found at the PFX team blog.

Greg Beech
  • 133,383
  • 43
  • 204
  • 250
  • 2
    Although it does seem to be copying the list (efficiency?) or modifying the existing one (not readily obvious from the signature). – Vilx- Dec 08 '09 at 12:20
  • @Vilk: as my crap answer proves, it's not easy to avoid duplicates efficiently so I suspect it's hard to beat it for efficiency – Ruben Bartelink Dec 08 '09 at 12:38
  • @Vilk - Yes, you're right, it did modify the original list. I've changed it so it doesn't now. It does unfortunately have to copy the collection to an array but that's pretty difficult to avoid (though I'm sure there is a way...) – Greg Beech Dec 08 '09 at 12:58
  • @Ryan: What difference would that make? It's returning random items, so whether the array is traversed forwards or backwards makes no difference. – LukeH Dec 08 '09 at 13:05
  • @Ryan - The Durstenfeld implementation can be written to reverse the list in either direction. While the reverse implementation is more commonly documented, the forward implementation is better here as it means it only has to shuffle the first n elements rather than all of them, as it doesn't need to shuffle elements that won't be returned. – Greg Beech Dec 08 '09 at 13:07
  • PS: I did check that the algorithm was unbiased... see: http://gregbeech.com/blog/determining-the-bias-of-a-shuffle-algorithm – Greg Beech Dec 08 '09 at 13:10
  • @Luke, these simple algorithms are easy to get wrong, and simple mistakes, such as running through the list forward, can introduce bias. But as Greg notes, he verified his solution. I have never seen this written this way, sorry for "_jump_ing to conclusions" – Ryan Emerle Dec 08 '09 at 13:17
  • @Ryan: Stepping forwards won't affect bias so long as the algorithm is implemented correctly, and it's possible to badly implement the backward-stepping algorithm too. Whether or not the results are biased is determined by the *correctness* of the implementation, *not* by the direction of traversal. – LukeH Dec 08 '09 at 13:18
  • I had wrongly commented that the list should be traversed in reverse. I deleted that comment. – Ryan Emerle Dec 08 '09 at 13:19
  • @Luke, right and I naively assumed that the implementation was incorrect. I have retracted the comment and feel sufficiently sheepish :) – Ryan Emerle Dec 08 '09 at 13:20
4

You really should do this in the database - no point in returning a big stack of stuff only to drop all but five on the floor. You should amend your question to explain what typew of data access stack is involved so people can give better answers. For instance, you could do an ORDER BY RAND():

SELECT TOP 5 ... FROM orders
ORDER BY RAND()

But that visits every row, which you don't want. If you're using SQL Server [and would like to be tied to it :P], you could use TABLESAMPLE.

If you're using LINQ to SQL, go here

EDIT: Just pretend the rest of this isnt here - it's not efficient and hence Greg's answer is far more desirable if you do want to sort client-side.

But, for completeness, paste the following into LINQPad:

var orders = new[] { "a", "b", "c", "d", "e", "f" };
var random = new Random();
var result = Enumerable.Range(1,5).Select(i=>orders[random.Next(5)])
result.Dump();

EDIT: Brute force answer to Greg's point (Yes, not efficient or pretty)

var orders = new[] { "a", "b", "c", "d", "e", "f" };

var random = new Random();

int countToTake = 5;

var taken = new List<int>( countToTake);

var result = Enumerable.Range(1,countToTake)
    .Select(i=>{
        int itemToTake; 
        do { 
            itemToTake = random.Next(orders.Length); 
        } while (taken.Contains(itemToTake)); 
        taken.Add(itemToTake); 
        return orders[itemToTake];
    });

result.Dump();
Community
  • 1
  • 1
Ruben Bartelink
  • 59,778
  • 26
  • 187
  • 249
2
return myList.OfType<Order>().OrderBy(o => Guid.NewGuid()).Take(5);
David Hedlund
  • 128,221
  • 31
  • 203
  • 222
  • This 'visits' each item of the array (yes, I know you knew that) – Ruben Bartelink Dec 08 '09 at 12:09
  • yes. i don't get where the *generics* requirement comes into play in the original question. if we just had a list of orders, we could drop the `OfType` and the above query would work just as well for a list as for a linq to sql table. If it was a linq to sql table, the `OrderBy` clause would actually resolve to `order by newid()` randomization at the database level, which is totally desirable (as you've pointed out) – David Hedlund Dec 08 '09 at 12:23
  • @David: I think the asker means List hence OfType is irrelevant. Problem is NewGuid is less efficient than Random() (and remember, *every Guid is sacred, every Guid is great*. Hadnt realised the default L2S [and presumably EF and LLBLGP etc] default OrderBy translation - which makes this worty a +1. (You should have said that in the post, shouldnt you :D) – Ruben Bartelink Dec 08 '09 at 12:40
  • well i guess =) i got confused by the getting-five-orders-out-of-a-generic-list part. a blatant waste of guid's, tho, i'd hate to contribute to making them less unique – David Hedlund Dec 08 '09 at 13:08
1
return collection.Where(()=>Random.Next(100) > (5 / collection.Count * 100)));
DontVoteMeDown
  • 21,122
  • 10
  • 69
  • 105
RA.
  • 1,405
  • 1
  • 11
  • 18