Select N random elements from a List in C#

Question

I need a quick algorithm to select 5 random elements from a generic list. For example, I'd like to get 5 random elements from a List<string>.

By Random, do you mean Inclusive or Exclusive? IOW, can the same element be picked more than once? (truly random) Or once an element is picked, should it be no longer pickable from the available pool? — Pretzel, Mar 18 '10 at 20:47
Very similar: [Pick N items at random from sequence of unknown length](https://stackoverflow.com/q/9690009), [Algorithm to select a single, random combination of values?](https://stackoverflow.com/questions/2394246) — user202729, May 29 '18 at 15:09
??? you just shuffle and take the first N .. why is there so much discussion here? — Fattie, Dec 31 '20 at 16:42
@Fattie This is for cases where shuffling is extremely inefficient (e.g., the list is huge) or you're not permitted to modify the order of the original list. — uckelman, Jan 25 '21 at 14:08
@uckelman the question says nothing at all about that. regarding the most absolutely efficient solution to this problem for profoundly large sets (and note that it's utterly inconceivable you'd use anything like "List" in such cases) it depends on the size domain. do note that the ticked answer is hopelessly wrong. — Fattie, Jan 25 '21 at 14:47
The accepted answer is not hopelessly wrong. It's not even wrong. See here: https://stackoverflow.com/questions/35065764/select-n-records-at-random-from-a-set-of-n Use case considerations aren't irrelevant simply because they're left unmentioned. — uckelman, Jan 26 '21 at 12:57
@Fattie Maybe give an argument that the accepted answer is wrong, rather than claiming so without one? — uckelman, Jan 26 '21 at 13:06
The best method is likely to be reservoir sampling, btw: https://en.wikipedia.org/wiki/Reservoir_sampling — uckelman, Jan 26 '21 at 13:17
hi @uckelman , cheers, there is already vast discussion pointing out the obvious problems; resevoir sampling is only useful in (as I stated) certain domains (actually, fully outlined in the 2nd current sentence of the wiki article). The question asked *is specifically about* a `List` specifically in `C#` and the user specifically wants a quick and simple solution. (obviously the answer is sort and take five. it would be *staggeringly bad* engineering if you did anything other than that in domains up to say, oh, 10,000 items. note that *of course* you can make up ... — Fattie, Jan 26 '21 at 14:45
... insanely obscure situations where you *wouldn't* do that and that's fine. that would be and is the subject of many algorithm questions say on software engineering. when one provides the correct answer here (all two words of the correct answer), sure, you may mention in a note that in incredibly obscure situations you wouldn't do that. {obviously, any working programmer would know that if the List is relatively huge, you'd just use the indeterminate picking algorithm, and you might give two lines of code to explain that, but *again*, *sure* you can THEN construct situations where you — Fattie, Jan 26 '21 at 14:48
... are using hadoop and gpus or something and then in *that* domain you would have to analyze which, as you say, resevoir sampling approach (of the many, and the ongoing research in that) is best.)) To make the situation more blunt, looking at the ticked "answer". Say this was an actual project, like a team on a game at Nintendo or such. There are "40" as in the answer (rofl) tanks on the field and 5 have to be randomly picked. One of the programmers starts writing that solution - they'd just be fired out of hand! Geesh. *inappropriate* engineering is *incredibly bad* engineering — Fattie, Jan 26 '21 at 14:53
@Fattie The vast discussion pointing out "obvious" problems _is_ the problem, frankly. — uckelman, Jan 26 '21 at 23:42
@Fattie Also, if you think reservoir sampling is "useful only in certain domains", I suggest reading _past_ the second sentence of the Wikipedia article. The algorithm given under the heading "An optimal algorithm" is short, simple, and generally applicable. — uckelman, Jan 26 '21 at 23:53
("domains" here is a fancy way to say "how many items". the approach mentioned is totally irrelevant on less than, say, a few hundred items. if you're not familiar with resevoir sampling and haven't used it before, the first sentence of the article clearly outlines what it relates to: "a population of ***unknown size*** n in a single pass over the items. The size of the population n ***is not known*** to the algorithm and is typically [larger than RAM sizes]" it literally has no connection to what is under discussion here.) — Fattie, Jan 27 '21 at 14:47

score 291 · Answer 1 · edited Feb 23 '10 at 19:42

291

Using linq:

YourList.OrderBy(x => rnd.Next()).Take(5)

edited Feb 23 '10 at 19:42

sth

222,467
53
283
367

answered Feb 22 '10 at 17:53

Ers

3,208
1
17
6

4

+1 But if two elements gets the same number from rnd.Next() or similar then the first will be selected and the second will possibly not (if no more elements is needed). It is properly random enough depending on usage, though. – Lasse Espeholt Jul 20 '10 at 09:37
9

I think the order by is O(n log(n)), so I would choose this solution if code simplicity is the main concern (i.e. with small lists). – Guido Jun 08 '11 at 18:44
4

But doesn't this enumerate and sort the whole list? Unless, by "quick", OP meant "easy", not "performant"... – drzaus Jun 21 '13 at 20:28
4

This will only work if OrderBy() only calls the key selector once for each element. If it calls it whenever it wants to perform a comparison between two elements then the it will get a different value back each time, which will screw up the sort. The [documentation] (https://msdn.microsoft.com/en-us/library/vstudio/bb534966%28v=vs.100%29.aspx) doesn't say which it does. – Oliver Bock Mar 05 '15 at 04:38
3

My profiling and logging of LINQ shows that it evaluates the OrderBy expression only once per element. If it did it any other way, costly OrderBy expressions would crush performance of sorting. While the contract does not promise it, they would be foolish to change it. – Paul Chernoch Jun 18 '15 at 18:01
5

Watch out if `YourList` has lots of items but you only want to select a few. In this case it is not an efficient way of doing it. – Callum Watkins Jul 21 '17 at 15:41

score 148 · Accepted Answer · edited Feb 16 '21 at 04:01

148

Iterate through and for each element make the probability of selection = (number needed)/(number left)

So if you had 40 items, the first would have a 5/40 chance of being selected. If it is, the next has a 4/39 chance, otherwise it has a 5/39 chance. By the time you get to the end you will have your 5 items, and often you'll have all of them before that.

This technique is called selection sampling, a special case of Reservoir Sampling. It's similar in performance to shuffling the input, but of course allows the sample to be generated without modifying the original data.

edited Feb 16 '21 at 04:01

Peter Duniho

68,759
7
102
136

answered Sep 07 '08 at 03:16

Kyle Cronin

77,653
43
148
164

1

Could you give an example implementation of this? This sounds nice, but how to do it doesn't come straightforward to me. – NotAPro May 18 '22 at 14:22

score 55 · Answer 3 · edited Sep 29 '12 at 18:55

55

public static List<T> GetRandomElements<T>(this IEnumerable<T> list, int elementsCount)
{
    return list.OrderBy(arg => Guid.NewGuid()).Take(elementsCount).ToList();
}

edited Sep 29 '12 at 18:55

Omar

16,329
10
48
66

answered Sep 29 '12 at 16:15

vag

583
4
2

There is is... Thanks! – NTDLS Jul 15 '22 at 19:01
2

Using `Guid.NewGuid()` doesn't not ensure randomness, just uniqueness. – Enigmativity Sep 17 '22 at 02:07

Tyler · Answer 4 · 2008-09-14T04:16:41.683

This is actually a harder problem than it sounds like, mainly because many mathematically-correct solutions will fail to actually allow you to hit all the possibilities (more on this below).

First, here are some easy-to-implement, correct-if-you-have-a-truly-random-number generator:

(0) Kyle's answer, which is O(n).

(1) Generate a list of n pairs [(0, rand), (1, rand), (2, rand), ...], sort them by the second coordinate, and use the first k (for you, k=5) indices to get your random subset. I think this is easy to implement, although it is O(n log n) time.

(2) Init an empty list s = [] that will grow to be the indices of k random elements. Choose a number r in {0, 1, 2, ..., n-1} at random, r = rand % n, and add this to s. Next take r = rand % (n-1) and stick in s; add to r the # elements less than it in s to avoid collisions. Next take r = rand % (n-2), and do the same thing, etc. until you have k distinct elements in s. This has worst-case running time O(k^2). So for k << n, this can be faster. If you keep s sorted and track which contiguous intervals it has, you can implement it in O(k log k), but it's more work.

@Kyle - you're right, on second thought I agree with your answer. I hastily read it at first, and mistakenly thought you were indicating to sequentially choose each element with fixed probability k/n, which would have been wrong - but your adaptive approach appears correct to me. Sorry about that.

Ok, and now for the kicker: asymptotically (for fixed k, n growing), there are n^k/k! choices of k element subset out of n elements [this is an approximation of (n choose k)]. If n is large, and k is not very small, then these numbers are huge. The best cycle length you can hope for in any standard 32 bit random number generator is 2^32 = 256^4. So if we have a list of 1000 elements, and we want to choose 5 at random, there's no way a standard random number generator will hit all the possibilities. However, as long as you're ok with a choice that works fine for smaller sets, and always "looks" random, then these algorithms should be ok.

Addendum: After writing this, I realized that it's tricky to implement idea (2) correctly, so I wanted to clarify this answer. To get O(k log k) time, you need an array-like structure that supports O(log m) searches and inserts - a balanced binary tree can do this. Using such a structure to build up an array called s, here is some pseudopython:

# Returns a container s with k distinct random numbers from {0, 1, ..., n-1}
def ChooseRandomSubset(n, k):
  for i in range(k):
    r = UniformRandom(0, n-i)                 # May be 0, must be < n-i
    q = s.FirstIndexSuchThat( s[q] - q > r )  # This is the search.
    s.InsertInOrder(q ? r + q : r + len(s))   # Inserts right before q.
  return s

I suggest running through a few sample cases to see how this efficiently implements the above English explanation.

for (1) you can shuffle a list faster than sorting is, for (2) you will be biasing your distribution by using % — jk., Jan 27 '10 at 13:39
Given the objection you raised about the cycle length of a rng, is there any way we _can_ construct an algorithm that will choose all sets with equal probability? — Jonah, Jul 11 '13 at 17:19
For (1), to improve the O(n log(n)) you could use selection sort to find the k smallest elements. That will run in O(n*k). — Jesus is Lord, May 26 '17 at 15:15
@Jonah: I think so. Let's assume we can combine multiple independent random number generators to create a larger one (https://crypto.stackexchange.com/a/27431). Then you just need a large enough range to deal with the size of list in question. — Jesus is Lord, May 26 '17 at 15:21

score 16 · Answer 5 · edited Aug 12 '10 at 16:56

16

I think the selected answer is correct and pretty sweet. I implemented it differently though, as I also wanted the result in random order.

    static IEnumerable<SomeType> PickSomeInRandomOrder<SomeType>(
        IEnumerable<SomeType> someTypes,
        int maxCount)
    {
        Random random = new Random(DateTime.Now.Millisecond);

        Dictionary<double, SomeType> randomSortTable = new Dictionary<double,SomeType>();

        foreach(SomeType someType in someTypes)
            randomSortTable[random.NextDouble()] = someType;

        return randomSortTable.OrderBy(KVP => KVP.Key).Take(maxCount).Select(KVP => KVP.Value);
    }

edited Aug 12 '10 at 16:56

Keltex

26,220
11
79
111

answered Jan 16 '09 at 18:57

Frank Schwieterman

24,142
15
92
130

AWESOME! Really helped me out! – Armstrongest Mar 25 '09 at 23:45
1

Do you have any reason not to use new Random() which is based on Environment.TickCount vs. DateTime.Now.Millisecond? – Lasse Espeholt Jul 20 '10 at 09:28
No, just wasn't aware that default existed. – Frank Schwieterman Jul 20 '10 at 15:37
An inprovement of the randomSortTable: randomSortTable = someTypes.ToDictionary(x => random.NextDouble(), y => y); Saves the foreach loop. – Keltex Aug 12 '10 at 17:01
2

OK a year late but... Doesn't this pan out to @ersin's rather shorter answer, and won't it fail if you get a repeated random number (Where Ersin's will have a bias towards the first item of a repeated pair) – Andiih Sep 08 '11 at 09:53
Good point about repeated result from random.NextDouble() potentially dropping results. – Frank Schwieterman Sep 08 '11 at 17:26
The concern I'd have with Ersin's approach is that depending on how the sort is implemented, theoretically the sort might never finish. This is because each members sort position changes every time its evaluated. – Frank Schwieterman Sep 08 '11 at 17:27
2

`Random random = new Random(DateTime.Now.Millisecond);` on each call is *definitely* wrong. Creating a new instance of `Random` each time reduces the actual randomness. Use a `static readonly` instance of it, preferably constructed with the default constructor. – jpmc26 Sep 25 '18 at 22:23
@Andiih Okay, a short 8+ years since your year late... but who's Ersin? ;^D Looks like someone changed their handle in the interim. (Unrelated: [That is very interesting, but not cheap](https://www.caterhamcars.com/us/models/the-iconic-range).) – ruffin Jul 24 '20 at 16:00
is not unique so if there is duplicate number in Random, you will lose 1 item. – Alex Kvitchastyi Dec 16 '20 at 11:47

dhakim · Answer 6 · 2012-01-16T07:02:52.937

I just ran into this problem, and some more google searching brought me to the problem of randomly shuffling a list: http://en.wikipedia.org/wiki/Fisher-Yates_shuffle

To completely randomly shuffle your list (in place) you do this:

To shuffle an array a of n elements (indices 0..n-1):

  for i from n − 1 downto 1 do
       j ← random integer with 0 ≤ j ≤ i
       exchange a[j] and a[i]

If you only need the first 5 elements, then instead of running i all the way from n-1 to 1, you only need to run it to n-5 (ie: n-5)

Lets say you need k items,

This becomes:

  for (i = n − 1; i >= n-k; i--)
  {
       j = random integer with 0 ≤ j ≤ i
       exchange a[j] and a[i]
  }

Each item that is selected is swapped toward the end of the array, so the k elements selected are the last k elements of the array.

This takes time O(k), where k is the number of randomly selected elements you need.

Further, if you don't want to modify your initial list, you can write down all your swaps in a temporary list, reverse that list, and apply them again, thus performing the inverse set of swaps and returning you your initial list without changing the O(k) running time.

Finally, for the real stickler, if (n == k), you should stop at 1, not n-k, as the randomly chosen integer will always be 0.

score 10 · Answer 7 · answered Sep 18 '20 at 01:20

10

12 years on and the this question is still active, I didn't find an implementation of Kyle's solution I liked so here it is:

public IEnumerable<T> TakeRandom<T>(IEnumerable<T> collection, int take)
{
    var random = new Random();
    var available = collection.Count();
    var needed = take;
    foreach (var item in collection)
    {
        if (random.Next(available) < needed)
        {
            needed--;
            yield return item;
            if (needed == 0)
            {
                break;
            }
        }
        available--;
    }
}

answered Sep 18 '20 at 01:20

DontPanic345

380
3
13

This one has been really useful. Thanks! – Ladrillo Feb 28 '22 at 23:04
This is a terrible solution if collection is "hot". There is no guarantee that an enumerable always produces the same values each time. – Enigmativity Sep 17 '22 at 02:09
@Enigmativity you're right! Do you think IList would be any better? – DontPanic345 Sep 20 '22 at 09:22

score 10 · Answer 8 · answered Mar 23 '12 at 16:32

10

You can use this but the ordering will happen on client side

 .AsEnumerable().OrderBy(n => Guid.NewGuid()).Take(5);

answered Mar 23 '12 at 16:32

Marwan Roushdy

1,214
12
25

Agreed. It might not be the best performing or the most random, but for the vast majority of people this will be good enough. – Richiban Feb 06 '14 at 17:45
Downvoted because [Guids are guaranteed to be unique, not random](https://stackoverflow.com/questions/38087244/is-this-guid-random-or-guessable/38087517#38087517). – Theodor Zoulias Jun 19 '20 at 11:59

score 9 · Answer 9 · edited Feb 14 '19 at 00:01

9

From Dragons in the Algorithm, an interpretation in C#:

int k = 10; // items to select
var items = new List<int>(new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 });
var selected = new List<int>();
double needed = k;
double available = items.Count;
var rand = new Random();
while (selected.Count < k) {
   if( rand.NextDouble() < needed / available ) {
      selected.Add(items[(int)available-1])
      needed--;
   }
   available--;
}

This algorithm will select unique indicies of the items list.

edited Feb 14 '19 at 00:01

Community

1
1

answered Sep 07 '08 at 04:31

spoulson

21,335
15
77
102

Only get enough item in the list, but not get randomly. – culithay Mar 01 '12 at 07:47
2

This implementation is broken because using `var` results in `needed` and `available` both being integers, which makes `needed/available` always 0. – Niko Sep 12 '14 at 08:30
1

This appears to be an implementation of the accepted answer. – DCShannon Oct 16 '15 at 04:07

score 7 · Answer 10 · edited May 23 '17 at 11:55

Was thinking about comment by @JohnShedletsky on the accepted answer regarding (paraphrase):

you should be able to to this in O(subset.Length), rather than O(originalList.Length)

Basically, you should be able to generate subset random indices and then pluck them from the original list.

The Method

public static class EnumerableExtensions {

    public static Random randomizer = new Random(); // you'd ideally be able to replace this with whatever makes you comfortable

    public static IEnumerable<T> GetRandom<T>(this IEnumerable<T> list, int numItems) {
        return (list as T[] ?? list.ToArray()).GetRandom(numItems);

        // because ReSharper whined about duplicate enumeration...
        /*
        items.Add(list.ElementAt(randomizer.Next(list.Count()))) ) numItems--;
        */
    }

    // just because the parentheses were getting confusing
    public static IEnumerable<T> GetRandom<T>(this T[] list, int numItems) {
        var items = new HashSet<T>(); // don't want to add the same item twice; otherwise use a list
        while (numItems > 0 )
            // if we successfully added it, move on
            if( items.Add(list[randomizer.Next(list.Length)]) ) numItems--;

        return items;
    }

    // and because it's really fun; note -- you may get repetition
    public static IEnumerable<T> PluckRandomly<T>(this IEnumerable<T> list) {
        while( true )
            yield return list.ElementAt(randomizer.Next(list.Count()));
    }

}

If you wanted to be even more efficient, you would probably use a HashSet of the indices, not the actual list elements (in case you've got complex types or expensive comparisons);

The Unit Test

And to make sure we don't have any collisions, etc.

[TestClass]
public class RandomizingTests : UnitTestBase {
    [TestMethod]
    public void GetRandomFromList() {
        this.testGetRandomFromList((list, num) => list.GetRandom(num));
    }

    [TestMethod]
    public void PluckRandomly() {
        this.testGetRandomFromList((list, num) => list.PluckRandomly().Take(num), requireDistinct:false);
    }

    private void testGetRandomFromList(Func<IEnumerable<int>, int, IEnumerable<int>> methodToGetRandomItems, int numToTake = 10, int repetitions = 100000, bool requireDistinct = true) {
        var items = Enumerable.Range(0, 100);
        IEnumerable<int> randomItems = null;

        while( repetitions-- > 0 ) {
            randomItems = methodToGetRandomItems(items, numToTake);
            Assert.AreEqual(numToTake, randomItems.Count(),
                            "Did not get expected number of items {0}; failed at {1} repetition--", numToTake, repetitions);
            if(requireDistinct) Assert.AreEqual(numToTake, randomItems.Distinct().Count(),
                            "Collisions (non-unique values) found, failed at {0} repetition--", repetitions);
            Assert.IsTrue(randomItems.All(o => items.Contains(o)),
                        "Some unknown values found; failed at {0} repetition--", repetitions);
        }
    }
}

Nice idea, with problems. (1) If your larger list is huge (read from a database, for example) then you realize the whole list, which may exceed memory. (2) If K is close to N, then you will thrash a lot searching for an unclaimed index in your loop, causing the code to require an unpredictable amount of time. These problems are solvable. — Paul Chernoch, Jun 18 '15 at 18:14
My solution for the problem of thrashing is this: if K < N/2, do it your way. If K >= N/2, choose the indices which should NOT be kept, instead of the ones that should be kept. There is still some thrashing, but much less. — Paul Chernoch, Jun 18 '15 at 18:17
Also noticed that this alters the order of the items being enumerated, which may be acceptable in some situations, but not in others. — Paul Chernoch, Jun 18 '15 at 19:28
On average, for K = N/2 (the worst case for Paul's suggested improvement), the (thrashing improved) algorithm appears to take ~0.693*N iterations. Now do a speed comparison. Is this better than the accepted answer? For which sample sizes? — mbomb007, Dec 21 '17 at 20:18

score 7 · Answer 11 · answered Jun 18 '15 at 21:16

I combined several of the above answers to create a Lazily-evaluated extension method. My testing showed that Kyle's approach (Order(N)) is many times slower than drzaus' use of a set to propose the random indices to choose (Order(K)). The former performs many more calls to the random number generator, plus iterates more times over the items.

The goals of my implementation were:

1) Do not realize the full list if given an IEnumerable that is not an IList. If I am given a sequence of a zillion items, I do not want to run out of memory. Use Kyle's approach for an on-line solution.

2) If I can tell that it is an IList, use drzaus' approach, with a twist. If K is more than half of N, I risk thrashing as I choose many random indices again and again and have to skip them. Thus I compose a list of the indices to NOT keep.

3) I guarantee that the items will be returned in the same order that they were encountered. Kyle's algorithm required no alteration. drzaus' algorithm required that I not emit items in the order that the random indices are chosen. I gather all the indices into a SortedSet, then emit items in sorted index order.

4) If K is large compared to N and I invert the sense of the set, then I enumerate all items and test if the index is not in the set. This means that I lose the Order(K) run time, but since K is close to N in these cases, I do not lose much.

Here is the code:

    /// <summary>
    /// Takes k elements from the next n elements at random, preserving their order.
    /// 
    /// If there are fewer than n elements in items, this may return fewer than k elements.
    /// </summary>
    /// <typeparam name="TElem">Type of element in the items collection.</typeparam>
    /// <param name="items">Items to be randomly selected.</param>
    /// <param name="k">Number of items to pick.</param>
    /// <param name="n">Total number of items to choose from.
    /// If the items collection contains more than this number, the extra members will be skipped.
    /// If the items collection contains fewer than this number, it is possible that fewer than k items will be returned.</param>
    /// <returns>Enumerable over the retained items.
    /// 
    /// See http://stackoverflow.com/questions/48087/select-a-random-n-elements-from-listt-in-c-sharp for the commentary.
    /// </returns>
    public static IEnumerable<TElem> TakeRandom<TElem>(this IEnumerable<TElem> items, int k, int n)
    {
        var r = new FastRandom();
        var itemsList = items as IList<TElem>;

        if (k >= n || (itemsList != null && k >= itemsList.Count))
            foreach (var item in items) yield return item;
        else
        {  
            // If we have a list, we can infer more information and choose a better algorithm.
            // When using an IList, this is about 7 times faster (on one benchmark)!
            if (itemsList != null && k < n/2)
            {
                // Since we have a List, we can use an algorithm suitable for Lists.
                // If there are fewer than n elements, reduce n.
                n = Math.Min(n, itemsList.Count);

                // This algorithm picks K index-values randomly and directly chooses those items to be selected.
                // If k is more than half of n, then we will spend a fair amount of time thrashing, picking
                // indices that we have already picked and having to try again.   
                var invertSet = k >= n/2;  
                var positions = invertSet ? (ISet<int>) new HashSet<int>() : (ISet<int>) new SortedSet<int>();

                var numbersNeeded = invertSet ? n - k : k;
                while (numbersNeeded > 0)
                    if (positions.Add(r.Next(0, n))) numbersNeeded--;

                if (invertSet)
                {
                    // positions contains all the indices of elements to Skip.
                    for (var itemIndex = 0; itemIndex < n; itemIndex++)
                    {
                        if (!positions.Contains(itemIndex))
                            yield return itemsList[itemIndex];
                    }
                }
                else
                {
                    // positions contains all the indices of elements to Take.
                    foreach (var itemIndex in positions)
                        yield return itemsList[itemIndex];              
                }
            }
            else
            {
                // Since we do not have a list, we will use an online algorithm.
                // This permits is to skip the rest as soon as we have enough items.
                var found = 0;
                var scanned = 0;
                foreach (var item in items)
                {
                    var rand = r.Next(0,n-scanned);
                    if (rand < k - found)
                    {
                        yield return item;
                        found++;
                    }
                    scanned++;
                    if (found >= k || scanned >= n)
                        break;
                }
            }
        }  
    }

I use a specialized random number generator, but you can just use C#'s Random if you want. (FastRandom was written by Colin Green and is part of SharpNEAT. It has a period of 2^128-1 which is better than many RNGs.)

Here are the unit tests:

[TestClass]
public class TakeRandomTests
{
    /// <summary>
    /// Ensure that when randomly choosing items from an array, all items are chosen with roughly equal probability.
    /// </summary>
    [TestMethod]
    public void TakeRandom_Array_Uniformity()
    {
        const int numTrials = 2000000;
        const int expectedCount = numTrials/20;
        var timesChosen = new int[100];
        var century = new int[100];
        for (var i = 0; i < century.Length; i++)
            century[i] = i;

        for (var trial = 0; trial < numTrials; trial++)
        {
            foreach (var i in century.TakeRandom(5, 100))
                timesChosen[i]++;
        }
        var avg = timesChosen.Average();
        var max = timesChosen.Max();
        var min = timesChosen.Min();
        var allowedDifference = expectedCount/100;
        AssertBetween(avg, expectedCount - 2, expectedCount + 2, "Average");
        //AssertBetween(min, expectedCount - allowedDifference, expectedCount, "Min");
        //AssertBetween(max, expectedCount, expectedCount + allowedDifference, "Max");

        var countInRange = timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
        Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
    }

    /// <summary>
    /// Ensure that when randomly choosing items from an IEnumerable that is not an IList, 
    /// all items are chosen with roughly equal probability.
    /// </summary>
    [TestMethod]
    public void TakeRandom_IEnumerable_Uniformity()
    {
        const int numTrials = 2000000;
        const int expectedCount = numTrials / 20;
        var timesChosen = new int[100];

        for (var trial = 0; trial < numTrials; trial++)
        {
            foreach (var i in Range(0,100).TakeRandom(5, 100))
                timesChosen[i]++;
        }
        var avg = timesChosen.Average();
        var max = timesChosen.Max();
        var min = timesChosen.Min();
        var allowedDifference = expectedCount / 100;
        var countInRange =
            timesChosen.Count(i => i >= expectedCount - allowedDifference && i <= expectedCount + allowedDifference);
        Assert.IsTrue(countInRange >= 90, String.Format("Not enough were in range: {0}", countInRange));
    }

    private IEnumerable<int> Range(int low, int count)
    {
        for (var i = low; i < low + count; i++)
            yield return i;
    }

    private static void AssertBetween(int x, int low, int high, String message)
    {
        Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
        Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
    }

    private static void AssertBetween(double x, double low, double high, String message)
    {
        Assert.IsTrue(x > low, String.Format("Value {0} is less than lower limit of {1}. {2}", x, low, message));
        Assert.IsTrue(x < high, String.Format("Value {0} is more than upper limit of {1}. {2}", x, high, message));
    }
}

Isn't there an error in the test? You have `if (itemsList != null && k < n/2)` which means inside the `if` `invertSet` is always `false` which means that logic is never used. — NetMage, Aug 20 '19 at 19:25

score 6 · Answer 12 · edited May 23 '17 at 12:26

Selecting N random items from a group shouldn't have anything to do with order! Randomness is about unpredictability and not about shuffling positions in a group. All the answers that deal with some kinda ordering is bound to be less efficient than the ones that do not. Since efficiency is the key here, I will post something that doesn't change the order of items too much.

1) If you need true random values which means there is no restriction on which elements to choose from (ie, once chosen item can be reselected):

public static List<T> GetTrueRandom<T>(this IList<T> source, int count, 
                                       bool throwArgumentOutOfRangeException = true)
{
    if (throwArgumentOutOfRangeException && count > source.Count)
        throw new ArgumentOutOfRangeException();

    var randoms = new List<T>(count);
    randoms.AddRandomly(source, count);
    return randoms;
}

If you set the exception flag off, then you can choose random items any number of times.

If you have { 1, 2, 3, 4 }, then it can give { 1, 4, 4 }, { 1, 4, 3 } etc for 3 items or even { 1, 4, 3, 2, 4 } for 5 items!

This should be pretty fast, as it has nothing to check.

2) If you need individual members from the group with no repetition, then I would rely on a dictionary (as many have pointed out already).

public static List<T> GetDistinctRandom<T>(this IList<T> source, int count)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    if (count == source.Count)
        return new List<T>(source);

    var sourceDict = source.ToIndexedDictionary();

    if (count > source.Count / 2)
    {
        while (sourceDict.Count > count)
            sourceDict.Remove(source.GetRandomIndex());

        return sourceDict.Select(kvp => kvp.Value).ToList();
    }

    var randomDict = new Dictionary<int, T>(count);
    while (randomDict.Count < count)
    {
        int key = source.GetRandomIndex();
        if (!randomDict.ContainsKey(key))
            randomDict.Add(key, sourceDict[key]);
    }

    return randomDict.Select(kvp => kvp.Value).ToList();
}

The code is a bit lengthier than other dictionary approaches here because I'm not only adding, but also removing from list, so its kinda two loops. You can see here that I have not reordered anything at all when count becomes equal to source.Count. That's because I believe randomness should be in the returned set as a whole. I mean if you want 5 random items from 1, 2, 3, 4, 5, it shouldn't matter if its 1, 3, 4, 2, 5 or 1, 2, 3, 4, 5, but if you need 4 items from the same set, then it should unpredictably yield in 1, 2, 3, 4, 1, 3, 5, 2, 2, 3, 5, 4 etc. Secondly, when the count of random items to be returned is more than half of the original group, then its easier to remove source.Count - count items from the group than adding count items. For performance reasons I have used source instead of sourceDict to get then random index in the remove method.

So if you have { 1, 2, 3, 4 }, this can end up in { 1, 2, 3 }, { 3, 4, 1 } etc for 3 items.

3) If you need truly distinct random values from your group by taking into account the duplicates in the original group, then you may use the same approach as above, but a HashSet will be lighter than a dictionary.

public static List<T> GetTrueDistinctRandom<T>(this IList<T> source, int count, 
                                               bool throwArgumentOutOfRangeException = true)
{
    if (count > source.Count)
        throw new ArgumentOutOfRangeException();

    var set = new HashSet<T>(source);

    if (throwArgumentOutOfRangeException && count > set.Count)
        throw new ArgumentOutOfRangeException();

    List<T> list = hash.ToList();

    if (count >= set.Count)
        return list;

    if (count > set.Count / 2)
    {
        while (set.Count > count)
            set.Remove(list.GetRandom());

        return set.ToList();
    }

    var randoms = new HashSet<T>();
    randoms.AddRandomly(list, count);
    return randoms.ToList();
}

The randoms variable is made a HashSet to avoid duplicates being added in the rarest of rarest cases where Random.Next can yield the same value, especially when input list is small.

So { 1, 2, 2, 4 } => 3 random items => { 1, 2, 4 } and never { 1, 2, 2}

{ 1, 2, 2, 4 } => 4 random items => exception!! or { 1, 2, 4 } depending on the flag set.

Some of the extension methods I have used:

static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
    return rnd.Next(source.Count);
}

public static T GetRandom<T>(this IList<T> source)
{
    return source[source.GetRandomIndex()];
}

static void AddRandomly<T>(this ICollection<T> toCol, IList<T> fromList, int count)
{
    while (toCol.Count < count)
        toCol.Add(fromList.GetRandom());
}

public static Dictionary<int, T> ToIndexedDictionary<T>(this IEnumerable<T> lst)
{
    return lst.ToIndexedDictionary(t => t);
}

public static Dictionary<int, T> ToIndexedDictionary<S, T>(this IEnumerable<S> lst, 
                                                           Func<S, T> valueSelector)
{
    int index = -1;
    return lst.ToDictionary(t => ++index, valueSelector);
}

If its all about performance with tens of 1000s of items in the list having to be iterated 10000 times, then you may want to have faster random class than System.Random, but I don't think that's a big deal considering the latter most probably is never a bottleneck, its quite fast enough..

Edit: If you need to re-arrange order of returned items as well, then there's nothing that can beat dhakim's Fisher-Yates approach - short, sweet and simple..

Jesús López · Answer 13 · 2018-04-20T11:58:44.117

Here you have one implementation based on Fisher-Yates Shuffle whose algorithm complexity is O(n) where n is the subset or sample size, instead of the list size, as John Shedletsky pointed out.

public static IEnumerable<T> GetRandomSample<T>(this IList<T> list, int sampleSize)
{
    if (list == null) throw new ArgumentNullException("list");
    if (sampleSize > list.Count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
    var indices = new Dictionary<int, int>(); int index;
    var rnd = new Random();

    for (int i = 0; i < sampleSize; i++)
    {
        int j = rnd.Next(i, list.Count);
        if (!indices.TryGetValue(j, out index)) index = j;

        yield return list[index];

        if (!indices.TryGetValue(i, out index)) index = i;
        indices[j] = index;
    }
}

score 6 · Answer 14 · answered Aug 11 '18 at 15:58

Extending from @ers's answer, if one is worried about possible different implementations of OrderBy, this should be safe:

// Instead of this
YourList.OrderBy(x => rnd.Next()).Take(5)

// Temporarily transform 
YourList
    .Select(v => new {v, i = rnd.Next()}) // Associate a random index to each entry
    .OrderBy(x => x.i).Take(5) // Sort by (at this point fixed) random index 
    .Select(x => x.v); // Go back to enumerable of entry

score 4 · Answer 15 · answered Jul 20 '10 at 09:18

The simple solution I use (probably not good for large lists): Copy the list into temporary list, then in loop randomly select Item from temp list and put it in selected items list while removing it form temp list (so it can't be reselected).

Example:

List<Object> temp = OriginalList.ToList();
List<Object> selectedItems = new List<Object>();
Random rnd = new Random();
Object o;
int i = 0;
while (i < NumberOfSelectedItems)
{
            o = temp[rnd.Next(temp.Count)];
            selectedItems.Add(o);
            temp.Remove(o);
            i++;
 }

Removing from the middle of a list so often will be costly. You may consider using a linked list for an algorithm requiring so many removals. Or equivalently, replace the removed item with a null value, but then you will thrash a bit as you pick already removed items and have to pick again. — Paul Chernoch, Jun 18 '15 at 18:19

score 3 · Answer 16 · answered Sep 07 '08 at 04:27

This is the best I could come up with on a first cut:

public List<String> getRandomItemsFromList(int returnCount, List<String> list)
{
    List<String> returnList = new List<String>();
    Dictionary<int, int> randoms = new Dictionary<int, int>();

    while (randoms.Count != returnCount)
    {
        //generate new random between one and total list count
        int randomInt = new Random().Next(list.Count);

        // store this in dictionary to ensure uniqueness
        try
        {
            randoms.Add(randomInt, randomInt);
        }
        catch (ArgumentException aex)
        {
            Console.Write(aex.Message);
        } //we can assume this element exists in the dictonary already 

        //check for randoms length and then iterate through the original list 
        //adding items we select via random to the return list
        if (randoms.Count == returnCount)
        {
            foreach (int key in randoms.Keys)
                returnList.Add(list[randoms[key]]);

            break; //break out of _while_ loop
        }
    }

    return returnList;
}

Using a list of randoms within a range of 1 - total list count and then simply pulling those items in the list seemed to be the best way, but using the Dictionary to ensure uniqueness is something I'm still mulling over.

Also note I used a string list, replace as needed.

Worked at the first shot ! – Sangam Uprety Jun 30 '16 at 12:43 — Sangam Uprety, Jun 30 '16 at 12:43

Tom Gullen · Answer 17 · 2013-07-11T16:34:47.807

Based on Kyle's answer, here's my c# implementation.

/// <summary>
/// Picks random selection of available game ID's
/// </summary>
private static List<int> GetRandomGameIDs(int count)
{       
    var gameIDs = (int[])HttpContext.Current.Application["NonDeletedArcadeGameIDs"];
    var totalGameIDs = gameIDs.Count();
    if (count > totalGameIDs) count = totalGameIDs;

    var rnd = new Random();
    var leftToPick = count;
    var itemsLeft = totalGameIDs;
    var arrPickIndex = 0;
    var returnIDs = new List<int>();
    while (leftToPick > 0)
    {
        if (rnd.Next(0, itemsLeft) < leftToPick)
        {
            returnIDs .Add(gameIDs[arrPickIndex]);
            leftToPick--;
        }
        arrPickIndex++;
        itemsLeft--;
    }

    return returnIDs ;
}

score 2 · Answer 18 · answered Jan 28 '15 at 07:21

2

This method may be equivalent to Kyle's.

Say your list is of size n and you want k elements.

Random rand = new Random();
for(int i = 0; k>0; ++i) 
{
    int r = rand.Next(0, n-i);
    if(r<k) 
    {
        //include element i
        k--;
    }
}

Works like a charm :)

-Alex Gilbert

answered Jan 28 '15 at 07:21

Alex Gilbert

21
3

1

That looks equivalent to me. Compare to the similar http://stackoverflow.com/a/48141/2449863 – DCShannon Oct 16 '15 at 04:22

score 2 · Answer 19 · edited Feb 21 '21 at 19:02

Here is a benchmark of three different methods:

The implementation of the accepted answer from Kyle.
An approach based on random index selection with HashSet duplication filtering, from drzaus.
A more academic approach posted by Jesús López, called Fisher–Yates shuffle.

The testing will consist of benchmarking the performance with multiple different list sizes and selection sizes.

I also included a measurement of the standard deviation of these three methods, i.e. how well distributed the random selection appears to be.

In a nutshell, drzaus's simple solution seems to be the best overall, from these three. The selected answer is great and elegant, but it's not that efficient, given that the time complexity is based on the sample size, not the selection size. Consequently, if you select a small number of items from a long list, it will take orders of magnitude more time. Of course it still performs better than the solutions based on complete reordering.

Curiously enough, this O(n) time complexity issue is true even if you only touch the list when you actually return an item, like I do in my implementation. The only thing I can thing of is that Random.Next() is pretty slow, and that performance benefits if you generate only one random number for each selected item.

And, also interestingly, the StdDev of Kyle's solution was significantly higher comparatively. I have no clue why; maybe the fault is in my implementation.

Sorry for the long code and output that will commence now; but I hope it's somewhat illuminative. Also, if you spot any issues in the tests or implementations, let me know and I'll fix it.

static void Main()
{
    BenchmarkRunner.Run<Benchmarks>();

    new Benchmarks() { ListSize = 100, SelectionSize = 10 }
        .BenchmarkStdDev();
}

[MemoryDiagnoser]
public class Benchmarks
{
    [Params(50, 500, 5000)]
    public int ListSize;

    [Params(5, 10, 25, 50)]
    public int SelectionSize;

    private Random _rnd;
    private List<int> _list;
    private int[] _hits;

    [GlobalSetup]
    public void Setup()
    {
        _rnd = new Random(12345);
        _list = Enumerable.Range(0, ListSize).ToList();
        _hits = new int[ListSize];
    }

    [Benchmark]
    public void Test_IterateSelect()
        => Random_IterateSelect(_list, SelectionSize).ToList();

    [Benchmark]
    public void Test_RandomIndices() 
        => Random_RandomIdices(_list, SelectionSize).ToList();

    [Benchmark]
    public void Test_FisherYates() 
        => Random_FisherYates(_list, SelectionSize).ToList();

    public void BenchmarkStdDev()
    {
        RunOnce(Random_IterateSelect, "IterateSelect");
        RunOnce(Random_RandomIdices, "RandomIndices");
        RunOnce(Random_FisherYates, "FisherYates");

        void RunOnce(Func<IEnumerable<int>, int, IEnumerable<int>> method, string methodName)
        {
            Setup();
            for (int i = 0; i < 1000000; i++)
            {
                var selected = method(_list, SelectionSize).ToList();
                Debug.Assert(selected.Count() == SelectionSize);
                foreach (var item in selected) _hits[item]++;
            }
            var stdDev = GetStdDev(_hits);
            Console.WriteLine($"StdDev of {methodName}: {stdDev :n} (% of average: {stdDev / (_hits.Average() / 100) :n})");
        }

        double GetStdDev(IEnumerable<int> hits)
        {
            var average = hits.Average();
            return Math.Sqrt(hits.Average(v => Math.Pow(v - average, 2)));
        }
    }

    public IEnumerable<T> Random_IterateSelect<T>(IEnumerable<T> collection, int needed)
    {
        var count = collection.Count();
        for (int i = 0; i < count; i++)
        {
            if (_rnd.Next(count - i) < needed)
            {
                yield return collection.ElementAt(i);
                if (--needed == 0)
                    yield break;
            }
        }
    }

    public IEnumerable<T> Random_RandomIdices<T>(IEnumerable<T> list, int needed)
    {
        var selectedItems = new HashSet<T>();
        var count = list.Count();

        while (needed > 0)
            if (selectedItems.Add(list.ElementAt(_rnd.Next(count))))
                needed--;

        return selectedItems;
    }

    public IEnumerable<T> Random_FisherYates<T>(IEnumerable<T> list, int sampleSize)
    {
        var count = list.Count();
        if (sampleSize > count) throw new ArgumentException("sampleSize may not be greater than list count", "sampleSize");
        var indices = new Dictionary<int, int>(); int index;

        for (int i = 0; i < sampleSize; i++)
        {
            int j = _rnd.Next(i, count);
            if (!indices.TryGetValue(j, out index)) index = j;

            yield return list.ElementAt(index);

            if (!indices.TryGetValue(i, out index)) index = i;
            indices[j] = index;
        }
    }
}

Output:

|        Method | ListSize | Select |        Mean |     Error |    StdDev |  Gen 0 | Allocated |
|-------------- |--------- |------- |------------:|----------:|----------:|-------:|----------:|
| IterateSelect |       50 |      5 |    711.5 ns |   5.19 ns |   4.85 ns | 0.0305 |     144 B |
| RandomIndices |       50 |      5 |    341.1 ns |   4.48 ns |   4.19 ns | 0.0644 |     304 B |
|   FisherYates |       50 |      5 |    573.5 ns |   6.12 ns |   5.72 ns | 0.0944 |     447 B |

| IterateSelect |       50 |     10 |    967.2 ns |   4.64 ns |   3.87 ns | 0.0458 |     220 B |
| RandomIndices |       50 |     10 |    709.9 ns |  11.27 ns |   9.99 ns | 0.1307 |     621 B |
|   FisherYates |       50 |     10 |  1,204.4 ns |  10.63 ns |   9.94 ns | 0.1850 |     875 B |

| IterateSelect |       50 |     25 |  1,358.5 ns |   7.97 ns |   6.65 ns | 0.0763 |     361 B |
| RandomIndices |       50 |     25 |  1,958.1 ns |  15.69 ns |  13.91 ns | 0.2747 |    1298 B |
|   FisherYates |       50 |     25 |  2,878.9 ns |  31.42 ns |  29.39 ns | 0.3471 |    1653 B |

| IterateSelect |       50 |     50 |  1,739.1 ns |  15.86 ns |  14.06 ns | 0.1316 |     629 B |
| RandomIndices |       50 |     50 |  8,906.1 ns |  88.92 ns |  74.25 ns | 0.5951 |    2848 B |
|   FisherYates |       50 |     50 |  4,899.9 ns |  38.10 ns |  33.78 ns | 0.4349 |    2063 B |

| IterateSelect |      500 |      5 |  4,775.3 ns |  46.96 ns |  41.63 ns | 0.0305 |     144 B |
| RandomIndices |      500 |      5 |    327.8 ns |   2.82 ns |   2.50 ns | 0.0644 |     304 B |
|   FisherYates |      500 |      5 |    558.5 ns |   7.95 ns |   7.44 ns | 0.0944 |     449 B |

| IterateSelect |      500 |     10 |  5,387.1 ns |  44.57 ns |  41.69 ns | 0.0458 |     220 B |
| RandomIndices |      500 |     10 |    648.0 ns |   9.12 ns |   8.54 ns | 0.1307 |     621 B |
|   FisherYates |      500 |     10 |  1,154.6 ns |  13.66 ns |  12.78 ns | 0.1869 |     889 B |

| IterateSelect |      500 |     25 |  6,442.3 ns |  48.90 ns |  40.83 ns | 0.0763 |     361 B |
| RandomIndices |      500 |     25 |  1,569.6 ns |  15.79 ns |  14.77 ns | 0.2747 |    1298 B |
|   FisherYates |      500 |     25 |  2,726.1 ns |  25.32 ns |  22.44 ns | 0.3777 |    1795 B |

| IterateSelect |      500 |     50 |  7,775.4 ns |  35.47 ns |  31.45 ns | 0.1221 |     629 B |
| RandomIndices |      500 |     50 |  2,976.9 ns |  27.11 ns |  24.03 ns | 0.6027 |    2848 B |
|   FisherYates |      500 |     50 |  5,383.2 ns |  36.49 ns |  32.35 ns | 0.8163 |    3870 B |

| IterateSelect |     5000 |      5 | 45,208.6 ns | 459.92 ns | 430.21 ns |      - |     144 B |
| RandomIndices |     5000 |      5 |    328.7 ns |   5.15 ns |   4.81 ns | 0.0644 |     304 B |
|   FisherYates |     5000 |      5 |    556.1 ns |  10.75 ns |  10.05 ns | 0.0944 |     449 B |

| IterateSelect |     5000 |     10 | 49,253.9 ns | 420.26 ns | 393.11 ns |      - |     220 B |
| RandomIndices |     5000 |     10 |    642.9 ns |   4.95 ns |   4.13 ns | 0.1307 |     621 B |
|   FisherYates |     5000 |     10 |  1,141.9 ns |  12.81 ns |  11.98 ns | 0.1869 |     889 B |

| IterateSelect |     5000 |     25 | 54,044.4 ns | 208.92 ns | 174.46 ns | 0.0610 |     361 B |
| RandomIndices |     5000 |     25 |  1,480.5 ns |  11.56 ns |  10.81 ns | 0.2747 |    1298 B |
|   FisherYates |     5000 |     25 |  2,713.9 ns |  27.31 ns |  24.21 ns | 0.3777 |    1795 B |

| IterateSelect |     5000 |     50 | 54,418.2 ns | 329.62 ns | 308.32 ns | 0.1221 |     629 B |
| RandomIndices |     5000 |     50 |  2,886.4 ns |  36.53 ns |  34.17 ns | 0.6027 |    2848 B |
|   FisherYates |     5000 |     50 |  5,347.2 ns |  59.45 ns |  55.61 ns | 0.8163 |    3870 B |

StdDev of IterateSelect: 671.88 (% of average: 0.67)
StdDev of RandomIndices: 296.07 (% of average: 0.30)
StdDev of FisherYates: 280.47 (% of average: 0.28)

the benchmark suggests the "Random_RandomIdices" to be the best compromise. However, its simple logic is inefficient when select/needed is close to the listSize with extended running time because of multiple retries to catch the last elements, as Paul was also mentioning in 2015 and as the benchmark with 50 among 50 confirms. Therefore, depending of the requirements, the best compromise of efficiency and simplicity is quite likely the FisherYates variant. — EricBDev, Jul 16 '21 at 21:01

score 1 · Answer 20 · answered Jun 25 '09 at 09:25

1

It is a lot harder than one would think. See the great Article "Shuffling" from Jeff.

I did write a very short article on that subject including C# code:
Return random subset of N elements of a given array

answered Jun 25 '09 at 09:25

Tobias Hertkorn

2,742
1
21
28

Cardinal · Answer 21 · 2022-06-16T12:26:47.800

public static IEnumerable<T> GetRandom<T>(IList<T> list, int count, Random random)
    {
        // Probably you should throw exception if count > list.Count
        count = Math.Min(list.Count, count);

        var selectedIndices = new SortedSet<int>();

        // Random upper bound (exclusive)
        int randomMax = list.Count;

        while (selectedIndices.Count < count)
        {
            int randomIndex = random.Next(0, randomMax);

            // skip over already selected indices
            foreach (var selectedIndex in selectedIndices)
                if (selectedIndex <= randomIndex)
                    ++randomIndex;
                else
                    break;

            yield return list[randomIndex];

            selectedIndices.Add(randomIndex);
            --randomMax;
        }
    }

Memory: ~count
Complexity: O(count²)

score 1 · Answer 22 · answered Sep 10 '18 at 16:02

Goal: Select N number of items from collection source without duplication. I created an extension for any generic collection. Here's how I did it:

public static class CollectionExtension
{
    public static IList<TSource> RandomizeCollection<TSource>(this IList<TSource> source, int maxItems)
    {
        int randomCount = source.Count > maxItems ? maxItems : source.Count;
        int?[] randomizedIndices = new int?[randomCount];
        Random random = new Random();

        for (int i = 0; i < randomizedIndices.Length; i++)
        {
            int randomResult = -1;
            while (randomizedIndices.Contains((randomResult = random.Next(0, source.Count))))
            {
                //0 -> since all list starts from index 0; source.Count -> maximum number of items that can be randomize
                //continue looping while the generated random number is already in the list of randomizedIndices
            }

            randomizedIndices[i] = randomResult;
        }

        IList<TSource> result = new List<TSource>();
        foreach (int index in randomizedIndices)
            result.Add(source.ElementAt(index));

        return result;
    }
}

TXNPRS · Answer 23 · 2022-04-30T17:59:00.023

Short and simple. Hope this helps someone!

if (list.Count > maxListCount)
{
    var rndList = new List<YourEntity>();
    var r = new Random();
    
    while (rndList.Count < maxListCount)
    {
        var addingElement = list[r.Next(list.Count)];

        //element uniqueness checking
        //choose your case
        //if (rndList.Contains(addingElement))
        //if (rndList.Any(p => p.Id == addingElement.Id))
            continue;
    
        rndList.Add(addingElement);
    }
    
    return rndList;
}

score 1 · Answer 24 · answered May 22 '22 at 05:53

public static IEnumerable<TItem> RandomSample<TItem>(this IReadOnlyList<TItem> items, int count) 
{
    if (count < 1 || count > items.Count)
    {
        throw new ArgumentOutOfRangeException(nameof(count));
    }
    List<int> indexes = Enumerable.Range(0, items.Count).ToList();
    int yieldedCount = 0;

    while (yieldedCount < count)
    {
        int i = RandomNumberGenerator.GetInt32(indexes.Count);
        int randomIndex = indexes[i];
        yield return items[randomIndex];

        // indexes.RemoveAt(i);                  // Avoid removing items from the middle of the list
        indexes[i] = indexes[indexes.Count - 1]; // Replace yielded index with the last one
        indexes.RemoveAt(indexes.Count - 1);     
        yieldedCount++;
    }
}

score 0 · Answer 25 · answered May 21 '13 at 05:57

This isn't as elegant or efficient as the accepted solution, but it's quick to write up. First, permute the array randomly, then select the first K elements. In python,

import numpy

N = 20
K = 5

idx = np.arange(N)
numpy.random.shuffle(idx)

print idx[:K]

score 0 · Answer 26 · answered Oct 30 '08 at 17:13

why not something like this:

 Dim ar As New ArrayList
    Dim numToGet As Integer = 5
    'hard code just to test
    ar.Add("12")
    ar.Add("11")
    ar.Add("10")
    ar.Add("15")
    ar.Add("16")
    ar.Add("17")

    Dim randomListOfProductIds As New ArrayList

    Dim toAdd As String = ""
    For i = 0 To numToGet - 1
        toAdd = ar(CInt((ar.Count - 1) * Rnd()))

        randomListOfProductIds.Add(toAdd)
        'remove from id list
        ar.Remove(toAdd)

    Next
'sorry i'm lazy and have to write vb at work :( and didn't feel like converting to c#

score 0 · Answer 27 · answered Aug 18 '15 at 08:08

I would use an extension method.

    public static IEnumerable<T> TakeRandom<T>(this IEnumerable<T> elements, int countToTake)
    {
        var random = new Random();

        var internalList = elements.ToList();

        var selected = new List<T>();
        for (var i = 0; i < countToTake; ++i)
        {
            var next = random.Next(0, internalList.Count - selected.Count);
            selected.Add(internalList[next]);
            internalList[next] = internalList[internalList.Count - selected.Count];
        }
        return selected;
    }

Wolf5 · Answer 28 · 2016-12-11T13:34:55.137

0

Using LINQ with large lists (when costly to touch each element) AND if you can live with the possibility of duplicates:

new int[5].Select(o => (int)(rnd.NextDouble() * maxIndex)).Select(i => YourIEnum.ElementAt(i))

For my use i had a list of 100.000 elements, and because of them being pulled from a DB I about halfed (or better) the time compared to a rnd on the whole list.

Having a large list will reduce the odds greatly for duplicates.

edited Dec 11 '16 at 13:34

answered Dec 09 '15 at 23:03

Wolf5

16,600
12
59
58

This solution may have repeated elements!! The random in the hole list may not. – AxelWass Dec 10 '16 at 01:38
Hmm. True. Where I use it, that does not matter though. Edited the answer to reflect that. – Wolf5 Dec 11 '16 at 13:35

score 0 · Answer 29 · answered Aug 20 '10 at 11:44

Here's my approach (full text here http://krkadev.blogspot.com/2010/08/random-numbers-without-repetition.html ).

It should run in O(K) instead of O(N), where K is the number of wanted elements and N is the size of the list to choose from:

public <T> List<T> take(List<T> source, int k) {
 int n = source.size();
 if (k > n) {
   throw new IllegalStateException(
     "Can not take " + k +
     " elements from a list with " + n +
     " elements");
 }
 List<T> result = new ArrayList<T>(k);
 Map<Integer,Integer> used = new HashMap<Integer,Integer>();
 int metric = 0;
 for (int i = 0; i < k; i++) {
   int off = random.nextInt(n - i);
   while (true) {
     metric++;
     Integer redirect = used.put(off, n - i - 1);
     if (redirect == null) {
       break;
     }
     off = redirect;
   }
   result.add(source.get(off));
 }
 assert metric <= 2*k;
 return result;
}

score 0 · Answer 30 · edited May 23 '17 at 11:55

I recently did this on my project using an idea similar to Tyler's point 1.
I was loading a bunch of questions and selecting five at random. Sorting was achieved using an IComparer.
aAll questions were loaded in the a QuestionSorter list, which was then sorted using the List's Sort function and the first k elements where selected.

    private class QuestionSorter : IComparable<QuestionSorter>
    {
        public double SortingKey
        {
            get;
            set;
        }

        public Question QuestionObject
        {
            get;
            set;
        }

        public QuestionSorter(Question q)
        {
            this.SortingKey = RandomNumberGenerator.RandomDouble;
            this.QuestionObject = q;
        }

        public int CompareTo(QuestionSorter other)
        {
            if (this.SortingKey < other.SortingKey)
            {
                return -1;
            }
            else if (this.SortingKey > other.SortingKey)
            {
                return 1;
            }
            else
            {
                return 0;
            }
        }
    }

Usage:

    List<QuestionSorter> unsortedQuestions = new List<QuestionSorter>();

    // add the questions here

    unsortedQuestions.Sort(unsortedQuestions as IComparer<QuestionSorter>);

    // select the first k elements

score 0 · Answer 31 · answered Jan 08 '23 at 19:42

I'd like to share my method. Reading other answers I was wondering if we really need to keep track of chosen items to uphold uniqueness of the results. Usually it slows down the algorithm because you need to repeat the draw if you happen to choose the same item again. So I came up with something different. If you don't care about modifying the input list you can shuffle the items in one go so that chosen items end up at the beginning of the list.

So in each iteration you choose an item and then you switch it to the front of the list. As a result you end up with random items at the start of the input list. The downside of this is that the input list order was modified, but you don't need to repeat the drawing, the results are unique. No need of any additional memory allocation etc. And it works really quick even for edge cases like selecting all items from the list at random.

Here is the code:

public IEnumerable<T> Random_Switch<T>(IList<T> list, int needed)
{
    for (int i = 0; i < needed; i++)
    {
        var index = _rnd.Next(i, list.Count);
        var item = list[index];
        list[index] = list[i];
        list[i] = item;
    }
    
    return list.Take(needed);
}

I also did some benchmarks benefiting from @Leaky answer and here are the results:

|             Method | ListSize | SelectionSize |        Mean |       Error |      StdDev |      Median |   Gen0 | Allocated |
|------------------- |--------- |-------------- |------------:|------------:|------------:|------------:|-------:|----------:|
| Test_IterateSelect |       50 |             5 |    662.2 ns |    13.19 ns |    27.54 ns |    660.9 ns | 0.0477 |     200 B |
| Test_RandomIndices |       50 |             5 |    256.6 ns |     5.12 ns |    12.86 ns |    254.0 ns | 0.0992 |     416 B |
|   Test_FisherYates |       50 |             5 |    405.4 ns |     8.05 ns |    17.33 ns |    401.7 ns | 0.1407 |     590 B |
|  Test_RandomSwitch |       50 |             5 |    152.8 ns |     2.91 ns |     4.87 ns |    153.4 ns | 0.0305 |     128 B |

| Test_IterateSelect |       50 |            10 |    853.8 ns |    17.07 ns |    29.44 ns |    853.9 ns | 0.0687 |     288 B |
| Test_RandomIndices |       50 |            10 |    530.8 ns |    10.63 ns |    28.93 ns |    523.7 ns | 0.1812 |     760 B |
|   Test_FisherYates |       50 |            10 |    862.8 ns |    17.09 ns |    38.92 ns |    859.2 ns | 0.2527 |    1057 B |
|  Test_RandomSwitch |       50 |            10 |    267.4 ns |     5.28 ns |    13.81 ns |    266.4 ns | 0.0343 |     144 B |

| Test_IterateSelect |       50 |            25 |  1,195.6 ns |    23.58 ns |    46.54 ns |  1,199.1 ns | 0.1049 |     440 B |
| Test_RandomIndices |       50 |            25 |  1,455.8 ns |    28.81 ns |    58.20 ns |  1,444.0 ns | 0.3510 |    1472 B |
|   Test_FisherYates |       50 |            25 |  2,066.7 ns |    41.35 ns |    85.40 ns |  2,049.0 ns | 0.4463 |    1869 B |
|  Test_RandomSwitch |       50 |            25 |    610.0 ns |    11.90 ns |    20.83 ns |    610.5 ns | 0.0496 |     208 B |

| Test_IterateSelect |       50 |            50 |  1,436.7 ns |    28.51 ns |    61.37 ns |  1,430.1 ns | 0.1717 |     720 B |
| Test_RandomIndices |       50 |            50 |  6,478.1 ns |   122.70 ns |   247.86 ns |  6,488.7 ns | 0.7248 |    3048 B |
|   Test_FisherYates |       50 |            50 |  3,428.5 ns |    68.49 ns |   118.15 ns |  3,424.5 ns | 0.5455 |    2296 B |
|  Test_RandomSwitch |       50 |            50 |  1,186.8 ns |    23.38 ns |    48.81 ns |  1,179.4 ns | 0.0725 |     304 B |

| Test_IterateSelect |      500 |             5 |  4,374.6 ns |    80.43 ns |   107.37 ns |  4,362.9 ns | 0.0458 |     200 B |
| Test_RandomIndices |      500 |             5 |    252.3 ns |     5.05 ns |    13.21 ns |    251.3 ns | 0.0992 |     416 B |
|   Test_FisherYates |      500 |             5 |    398.0 ns |     7.97 ns |    18.48 ns |    399.3 ns | 0.1411 |     592 B |
|  Test_RandomSwitch |      500 |             5 |    155.4 ns |     3.10 ns |     7.24 ns |    155.0 ns | 0.0305 |     128 B |

| Test_IterateSelect |      500 |            10 |  4,950.1 ns |    96.72 ns |   150.58 ns |  4,942.7 ns | 0.0687 |     288 B |
| Test_RandomIndices |      500 |            10 |    490.0 ns |     9.70 ns |    20.66 ns |    490.6 ns | 0.1812 |     760 B |
|   Test_FisherYates |      500 |            10 |    805.2 ns |    15.70 ns |    20.96 ns |    808.2 ns | 0.2556 |    1072 B |
|  Test_RandomSwitch |      500 |            10 |    254.1 ns |     5.09 ns |    13.31 ns |    253.6 ns | 0.0343 |     144 B |

| Test_IterateSelect |      500 |            25 |  5,785.1 ns |   115.19 ns |   201.74 ns |  5,800.2 ns | 0.0992 |     440 B |
| Test_RandomIndices |      500 |            25 |  1,123.6 ns |    22.31 ns |    53.03 ns |  1,119.6 ns | 0.3510 |    1472 B |
|   Test_FisherYates |      500 |            25 |  1,959.1 ns |    38.82 ns |    91.51 ns |  1,971.1 ns | 0.4807 |    2016 B |
|  Test_RandomSwitch |      500 |            25 |    601.1 ns |    11.83 ns |    23.63 ns |    599.8 ns | 0.0496 |     208 B |

| Test_IterateSelect |      500 |            50 |  6,570.5 ns |   127.03 ns |   190.13 ns |  6,599.8 ns | 0.1678 |     720 B |
| Test_RandomIndices |      500 |            50 |  2,199.6 ns |    43.23 ns |    73.41 ns |  2,198.6 ns | 0.7286 |    3048 B |
|   Test_FisherYates |      500 |            50 |  3,830.0 ns |    76.33 ns |   159.33 ns |  3,809.9 ns | 0.9842 |    4128 B |
|  Test_RandomSwitch |      500 |            50 |  1,150.7 ns |    22.60 ns |    34.52 ns |  1,156.7 ns | 0.0725 |     304 B |

| Test_IterateSelect |     5000 |             5 | 42,833.1 ns |   779.35 ns | 1,463.80 ns | 42,758.9 ns |      - |     200 B |
| Test_RandomIndices |     5000 |             5 |    248.9 ns |     4.95 ns |     9.29 ns |    248.8 ns | 0.0992 |     416 B |
|   Test_FisherYates |     5000 |             5 |    388.9 ns |     7.79 ns |    17.90 ns |    387.0 ns | 0.1411 |     592 B |
|  Test_RandomSwitch |     5000 |             5 |    153.8 ns |     3.10 ns |     6.41 ns |    154.7 ns | 0.0305 |     128 B |

| Test_IterateSelect |     5000 |            10 | 46,814.2 ns |   914.35 ns | 1,311.33 ns | 46,822.7 ns | 0.0610 |     288 B |
| Test_RandomIndices |     5000 |            10 |    498.9 ns |    10.01 ns |    28.56 ns |    491.1 ns | 0.1812 |     760 B |
|   Test_FisherYates |     5000 |            10 |    800.1 ns |    14.44 ns |    29.83 ns |    796.3 ns | 0.2556 |    1072 B |
|  Test_RandomSwitch |     5000 |            10 |    271.6 ns |     5.45 ns |    15.63 ns |    269.2 ns | 0.0343 |     144 B |

| Test_IterateSelect |     5000 |            25 | 50,900.4 ns | 1,000.71 ns | 1,951.81 ns | 51,068.5 ns | 0.0610 |     440 B |
| Test_RandomIndices |     5000 |            25 |  1,112.7 ns |    20.06 ns |    30.63 ns |  1,114.6 ns | 0.3510 |    1472 B |
|   Test_FisherYates |     5000 |            25 |  1,965.9 ns |    38.82 ns |    62.68 ns |  1,953.2 ns | 0.4807 |    2016 B |
|  Test_RandomSwitch |     5000 |            25 |    610.7 ns |    12.23 ns |    20.76 ns |    613.6 ns | 0.0496 |     208 B |

| Test_IterateSelect |     5000 |            50 | 52,062.6 ns | 1,031.59 ns | 1,694.93 ns | 51,882.6 ns | 0.1221 |     720 B |
| Test_RandomIndices |     5000 |            50 |  2,203.7 ns |    43.90 ns |    87.67 ns |  2,197.9 ns | 0.7286 |    3048 B |
|   Test_FisherYates |     5000 |            50 |  3,729.2 ns |    73.08 ns |   124.10 ns |  3,701.8 ns | 0.9842 |    4128 B |
|  Test_RandomSwitch |     5000 |            50 |  1,185.1 ns |    23.29 ns |    39.54 ns |  1,186.5 ns | 0.0725 |     304 B |

Also I guess if you really need to keep the input list unmodified you could store the indices that were switched and revert the order before returning from the function, but that of course would cause additional allocations.

score -1 · Answer 32 · answered Jan 28 '16 at 00:02

When N is very large, the normal method that randomly shuffles the N numbers and selects, say, first k numbers, can be prohibitive because of space complexity. The following algorithm requires only O(k) for both time and space complexities.

http://arxiv.org/abs/1512.00501

def random_selection_indices(num_samples, N):
    modified_entries = {}
    seq = []
    for n in xrange(num_samples):
        i = N - n - 1
        j = random.randrange(i)

        # swap a[j] and a[i] 
        a_j = modified_entries[j] if j in modified_entries else j 
        a_i = modified_entries[i] if i in modified_entries else i

        if a_i != j:
            modified_entries[j] = a_i   
        elif j in modified_entries:   # no need to store the modified value if it is the same as index
            modified_entries.pop(j)

        if a_j != i:
            modified_entries[i] = a_j 
        elif i in modified_entries:   # no need to store the modified value if it is the same as index
            modified_entries.pop(i)
        seq.append(a_j)
    return seq

score -1 · Answer 33 · answered Feb 15 '23 at 22:43

public static IEnumerable<Element> GetRandomElements(this IList<Element> list, int n)
{
    var count = list.Count();
    if (count < n)
    {
        throw new Exception("n cannot be bigger than the list size.");
    }
    var indexes = new HashSet<int>();
    while (set.Count < n)
    {
        indexes.Add(Random.Next(count));
    }
    return indexes.Select(x => list[x]);
}

I use this as reference : Performance of Arrays vs. Lists

The implementation is ok because the list is fast enough to get an element by id.

"Random" is define outside of the method scope Hashset ensure the unicity of each index

Limits : the algorithm work better if the list is big and n small. Else it could be that due to collision, the while loop takes a lot of time

In this case using

public static IEnumerable<Element> GetRandomElements(this IList<Element> list, int n)
{
    return list.OrderBy(x => Random.Next()).Take(n);
}

could be an available option

Cyrille · Answer 34 · 2020-06-18T00:58:14.397

-2

This will solve your issue

var entries=new List<T>();
var selectedItems = new List<T>();


                for (var i = 0; i !=10; i++)
                {
                    var rdm = new Random().Next(entries.Count);
                        while (selectedItems.Contains(entries[rdm]))
                            rdm = new Random().Next(entries.Count);

                    selectedItems.Add(entries[rdm]);
                }

edited Jun 18 '20 at 00:58

answered Jun 18 '20 at 00:48

Cyrille

17
2

While this might answer the question, you should [edit] your answer to include an explanation of *how* this code block answers the question. This helps to provide context and makes your answer much more useful to future readers. – Hoppeduppeanut Jun 18 '20 at 01:56

Select N random elements from a List in C#

34 Answers34

The Method

The Unit Test

Linked

Related