23

Here's a very handy extension, which works for an array of anything:

public static T AnyOne<T>(this T[] ra) where T:class
{
    int k = ra.Length;
    int r = Random.Range(0,k);
    return ra[r];
}

Unfortunately it does not work for a List<> of anything. Here's the same extension that works for any List<>

public static T AnyOne<T>(this List<T> listy) where T:class
{
    int k = listy.Count;
    int r = Random.Range(0,k);
    return listy[r];
}

In fact, is there a way to generalise generics covering both arrays and List<>s in one go? Or is it know to be not possible?


Could the answer even (gasp) encompass Collections?


PS, I apologize for not explicitly mentioning this is in the Unity3D milieu. For example "Random.Range" is just a Unity call (which does the obvious), and "AnyOne" is a completely typical extension or call in game programming.

Obviously, the question of course applies in any c# milieu.

Fattie
  • 27,874
  • 70
  • 431
  • 719

6 Answers6

18

In fact the most appropriate common interface between T[] and List<T> for your case is IReadOnlyList<T>

public static T AnyOne<T>(this IReadOnlyList<T> list) where T:class
{
    int k = list.Count;
    int r = Random.Range(0,k);
    return list[r];
}

As mentioned in another answer, IList<T> also works, but the good practice requires you to request from the caller the minimum functionality needed by the method, which in this case is Count property and read only indexer.

IEnumerable<T> also works, but it allows the caller to pass a non collection iterator where Count and ElementAt extension methods could be highly inefficient - like Enumerable.Range(0, 1000000), database query etc.


2020, quick for Unity3D programmers: of course, nowadays modern versions of .Net are available in Unity!

enter image description here

Fattie
  • 27,874
  • 70
  • 431
  • 719
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • Array is everything :) `IEnumerable`, `ICollection`, `IList`, `IEnumerable`, `ICollection`, `IList`, `IReadOnlyCollection`, `IReadOnlyList` – Ivan Stoev Nov 26 '15 at 19:30
  • @JoeBlow Agreed, this should be marked as the answer. – Richard Szalay Nov 26 '15 at 21:04
  • 1
    @JoeBlow Yeah... it's not just for Unity3D unfortunately.... it's because `IList` doesn't implement `IReadOnlyList`. Probably because you can have the case of a list that you can just write, not read. That's why the answer I added includes both cases. For custom lists you might run into the same trouble; there are more implementations of IList out there than IReadOnlyList unfortunately. – atlaste Jun 20 '16 at 09:36
  • @atlaste I think the subject here is not the concrete implementation, but the generic method general design. Forget about random, let say the method **requires** Count and indexer. `IEnumerable` does not have any of those, and `IList` has too much. When implementing a custom container class (and we do a lot at my work), it's much easier to implement `IReadOnlyList` than `IList`. The only drawback is that the interface has been introduced too late and after `IList`, so for backward compatibility reasons `IList` does not inherit from it as it would normally do. – Ivan Stoev Jun 20 '16 at 12:14
  • @IvanStoev Errm... the truth is that core to the design of Linq, the assumption is made that the enumerators are *used once or deterministic*. The hard reality is that the interface doesn't give you these guarantees. In this case, this means "bugs". In other words: `IEnumerable` is wrong if you *require* both. However, the question is if the implementor *even considered these implicit constraints in the implementation phase*. I don't think this is realistic; personally I think this is a serious design flaw in Linq and I've seen tons of real-life bugs happen because of this. – atlaste Jun 20 '16 at 13:08
  • @IvanStoev So, all things being equal, I believe that we can reasonably assume that `AnyOne` is a utility function, which means a proper design will boil down to 'reusability'. Considering that, `IEnumerable` sounds to me like a pretty reasonable design requirement for this scenario, even though we can agree on that it's broken on some point. If the implementation details then **requires** Count and an indexer, it merely implies that we have to enforce it. That might sound unexpected, but in fact there are plenty of Linq functions that do this already (OrderBy f.ex.) – atlaste Jun 20 '16 at 13:08
  • 2
    @JoeBlow If you look at the very bottom of the [IReadOnlyList Interface documentation](https://msdn.microsoft.com/en-us/library/hh192385(v=vs.110).aspx), you'll see something like **.NET Framework** *Available since 4.5* :( So in earlier versions you have to resort to `IList` (*Available since 2.0*) – Ivan Stoev Jun 23 '16 at 13:38
  • 1
    You are welcome, Joe. It's was pleasure to be involved in this interesting discussion. – Ivan Stoev Jun 24 '16 at 16:07
  • For anyone googling here, a decade ago this was not available in older .Net but is now universally available. In that historic era, Richard's answer (IList) worked fine! I did delete the many older comments dealing with whether or not it was available in eg. Unity3D. – Fattie Oct 22 '20 at 13:14
13

T[] and List<T> actually both implement IList<T>, which provides enumeration, a Count property and an indexer.

public static T AnyOne<T>(this IList<T> ra) 
{
    int k = ra.Count;
    int r = Random.Range(0,k);
    return ra[r];
}

Historical note: in past decades, this was the correct and only solution for Unity3D specifically, as in the Olden Days modern .Net was not available in Unity.

Fattie
  • 27,874
  • 70
  • 431
  • 719
Richard Szalay
  • 83,269
  • 19
  • 178
  • 237
  • 1
    Yeah, makes me wonder why that is. Arrays implementing `IList` violates the Liskov substitution principle. – Yannick Motton Nov 26 '15 at 19:21
  • True, though `T[]` returns true for `IsReadOnly`. The read/write interfaces should definitely have been separated, though, since I'm sure that property is rarely very rarely guarded against. – Richard Szalay Nov 26 '15 at 19:27
  • Yes and Add/Clear/Remove throws InvalidOperationException. Great API design there ;-) – Yannick Motton Nov 26 '15 at 19:29
  • What do you expert dudes think about using `IReadOnlyList` rather than `IList<>` ? It's all completely beyond me! :) – Fattie Nov 26 '15 at 19:30
  • 2
    @Joe IReadOnlyList is actually the better choice here, as it doesn't allow write operations (like Add), which the asker's code doesn't require. – Richard Szalay Nov 26 '15 at 19:32
  • Be lenient in what you accept of others, be specific in what you return. IE is the most lenient you can get: It takes any type of sequence as input. – Yannick Motton Nov 26 '15 at 19:32
  • `IList` and `IReadOnllyList` both inherit from `IEnumerable`. Using `IEnumerable` is the best option. – ataravati Nov 26 '15 at 19:32
  • @Richard thanks for that great explanation. Nevertheless ataravati's point seems compelling???? – Fattie Nov 26 '15 at 19:33
  • @ataravati not when you also need Count and an indexer. Otherwise you end up calling extension methods that assume functionality that may bot exist. Count on a purr enumerable stream will evaluate the entire stream to count the elements; GetElementAt will evaluate the stream again. – Richard Szalay Nov 26 '15 at 19:33
  • @RichardSzalay What are you talking about? LINQ introspects and optimizes based on data type, it will call Length in case of Array, Count in case of List, and only enumerate if needed. And really, what are we talking about here... If *performance* would be a factor here, the OP would just take the first element. – Yannick Motton Nov 26 '15 at 19:37
  • 1
    "not when you also need Count and an indexer" ah of course; you need to choose the abstraction level that has those concepts -- so perhaps IReadOnlyList is the best thinking? – Fattie Nov 26 '15 at 19:45
  • Hi @Yannick - it's of no consequence to the technological discussion at hand, but "AnyOne" surely means a "random" one, in the game programming milieu. (It's extremely common, universal, to have such a function.) Sorry if I didn't explicitly clarify this. {Note that for first one, last one, you're well-served with existing methods.} – Fattie Nov 26 '15 at 19:46
  • @Yannick its not about performance it's about intent, and the principal of least surprise. Assuming additional functionality on an interface when a more appropriate interface is available helps neither. – Richard Szalay Nov 26 '15 at 19:46
  • 1
    Pointless discussions aside, @RichardSzalay would you mind changing the indexer access to `ra[r]`. As it stands, if Random produces a `0`, this code will raise an `IndexOutOfRangeException`. – Yannick Motton Nov 26 '15 at 20:07
  • @YannickMotton Actually... there are some real problems with this implementation. While I do agree that an implementation based on `IEnumerable` is the better choice for reusability reasons, this is not the way I would have solved it. I've written an answer below for the details. – atlaste Jun 20 '16 at 07:20
  • 1
    Hey men I boldly decided to put the token bounty here. The reason is twofold, (1) R.S. first "blew the lid off" this question which was difficult and challenging for a long time, with `IList` . . . the (outstanding) refinement to `IReadOnlyList` only came along because of this answer. Also, (2) for the ten billion game programmers googling here (this is the single most popular Extension idea when writing components in c#) this is indeed "the answer", so that's a good thing. I thank EVERYONE. – Fattie Jun 24 '16 at 14:05
6

It's interesting how some people choose IEnumerable<T>, while some other people insist on IReadOnlyList<T>.

Now let's be honest. IEnumerable<T> is useful, very useful. In most cases you just want to put this method in some library, and throw your utility function to whatever you think is a collection, and be done with it. However, using IEnumerable<T> correctly is a bit tricky, as I'll point out here...

IEnumerable

Let's for a second assume that the OP is using Linq and wants to get a random element from a sequence. Basically he ends up with the code from @Yannick, that ends up in the library of utility helper functions:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    int endExclusive = source.Count(); // #1
    int randomIndex = Random.Range(0, endExclusive); 
    return source.ElementAt(randomIndex); // #2
}

Now, what this basically does is 2 things:

  1. Count the number of elements in the source. If the source is a simple IEnumerable<T> this implies going through all the elements in the list, if it's f.ex. a List<T>, it will use the Count property.
  2. Reset the enumerable, go to element randomIndex, grab it and return it.

There are two things that can go wrong here. First of all, your IEnumerable might be a slow, sequential storage, and doing Count can ruin the performance of your application in an unexpected way. For example, streaming from a device might get you into trouble. That said, you could very well argue that's to be expected when that's inherent to the characteristic of the collection - and personally I'd say that argument will hold.

Secondly -and this is perhaps even more important- there's no guarantee that you enumerable will return the same sequence every iteration (and therefore there's also no guarantee that your code won't crash). For example, consider this innocent looking piece of code, that might be useful for testing purposes:

IEnumerable<int> GenerateRandomDataset()
{
    Random rnd = new Random();
    int count = rnd.Next(10, 100); // randomize number of elements
    for (int i=0; i<count; ++i)
    {
        yield return new rnd.Next(0, 1000000); // randomize result
    }
}

The first iteration (calling Count()), you might generate 99 results. You pick element 98. Next you call ElementAt, the second iteration generates 12 results and your application crashes. Not cool.

Fixing the IEnumerable implementation

As we've seen, the issue of the IEnumerable<T> implementation is that you have to go through the data 2 times. We can fix that by going through the data a single time.

The 'trick' here is actually pretty simple: if we have seen 1 element, we definitely want to consider returning that. All elements considered, there's a 50%/50% chance that this is the element we would have returned. If we see the third element, there's a 33%/33%/33% chance that we would have returned this. And so on.

Therefore, a better implementation might be this one:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    Random rnd = new Random();
    double count = 1;
    T result = default(T);
    foreach (var element in source)
    {
        if (rnd.NextDouble() <= (1.0 / count)) 
        {
            result = element;
        }
        ++count;
    }
    return result;
}

On a side note: if we're using Linq, we would expect operations to use the IEnumerable<T> once (and only once!). Now you know why.

Making it work with lists and arrays

While this is a neat trick, our performance will now be slower if we work on a List<T>, which doesn't make any sense because we know there's a much better implementation available due the the property that indexing and Count are available to us.

What we're looking for is the common denominator for this better solution, that's used in as many collections as we can find. The thing we'll end up with is the IReadOnlyList<T> interface, that implements everything we need.

Because of the properties that we know to be true for IReadOnlyList<T>, we can now safely use Count and indexing, without running the risk of crashing the application.

However, while IReadOnlyList<T> seems appealing, IList<T> for some reason doesn't seem to implement it... which basically means that IReadOnlyList<T> is a bit of a gamble in practice. In that respect, I'm pretty sure there are a lot more IList<T> implementations out there than IReadOnlyList<T> implementations. It therefore seems best to simply support both interfaces.

This leads us to the solution here:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    var rnd = new Random();
    var list = source as IReadOnlyList<T>;
    if (list != null)
    {
        int index = rnd.Next(0, list.Count);
        return list[index];
    }

    var list2 = source as IList<T>;
    if (list2 != null)
    {
        int index = rnd.Next(0, list2.Count);
        return list2[index];
    }
    else
    {
        double count = 1;
        T result = default(T);
        foreach (var element in source)
        {
            if (rnd.NextDouble() <= (1.0 / count))
            {
                result = element;
            }
            ++count;
        }
        return result;
    }
}

PS: For more complex scenario's, check out the Strategy Pattern.

Random

@Yannick Motton made the remark that you have to be careful with Random, because it won't be really random if you call methods like this a lot of times. Random is initialized with the RTC, so if you make a new instance a lot of times, it won't change the seed.

A simple way around this is as follows:

private static int seed = 12873; // some number or a timestamp.

// ...

// initialize random number generator:
Random rnd = new Random(Interlocked.Increment(ref seed));

This way, every time you call AnyOne, the random number generator will receive another seed and it will work even in tight loops.

To summarize:

So, to summarize it:

  • IEnumerable<T>'s should be iterated once, and only once. Doing otherwise might give the user unexpected results.
  • If you have access to better capabilities than simple enumeration, it's not necessary to go through all the elements. Best to grab the right result right away.
  • Consider what interfaces you're checking very carefully. While IReadOnlyList<T> is definitely the best candidate, it's not inherited from IList<T> which means it'll be less effective in practice.

The end result is something that Just Works.

atlaste
  • 30,418
  • 3
  • 57
  • 87
  • Oops, typo... I write these answers in notepad usually :-) It should be 'count'. – atlaste Jun 20 '16 at 11:59
  • hmm, very surprisingly I have never seen that "running chance" algorithm before - thanks! – Fattie Jun 20 '16 at 12:07
  • @JoeBlow It's called [Reservoir sampling](https://en.wikipedia.org/wiki/Reservoir_sampling) :) – Ivan Stoev Jun 21 '16 at 18:26
  • good one Ivan thx. (That's an exceptionally poor wiki page btw!) – Fattie Jun 21 '16 at 18:40
  • 2
    @IvanStoev LOL :) To be honest I find it silly that all these trivial algorithms are named, I usually just make them up as I go... – atlaste Jun 21 '16 at 19:11
  • {Just FTR I'm not really sure you'd actually call the 1/2, 1/3... 1/n algorithm here "resevoir sampling". Anyway.} – Fattie Jun 23 '16 at 13:08
  • I feel this answer brilliantly explains *how to do it with IEnumerable* if indeed one *wants to 'cover' all of IEnumerable*. Brilliant. – Fattie Jun 23 '16 at 13:08
  • @atlaste One thing to consider is that newing up `Random` inside of the call will likely produce the same result when called in a tight loop, since the default constructor for Random takes the current machine tick as seed, same tick, same random sequence. Might be a good idea to use `StaticRandom`: see http://www.yoda.arachsys.com/csharp/miscutil/usage/staticrandom.html. – Yannick Motton Jan 04 '17 at 14:32
  • @YannickMotton It's not intended to be production-grade code. F.ex. I'd also expect overloads that allow a seed to be passed to the method. By the ways, I wouldn't use the `StaticRandom` implementation you linked there: it's not necessary to use a full lock. You can simply use a static counter that you seed with a timestamp, `Interlocked.Increment(ref counter)` and pass that counter to the c'tor of Random as seed. – atlaste Jan 09 '17 at 16:13
  • @atlaste My point was newing up an instance of `Random` in a utility method that is likely to be called often isn't a good idea in terms of randomness. I do agree that a fully locked Random instance isn't required, as long as the instance is not reused over multiple threads. However, the way you presented it, it seems to me you would run this from a factory method for `Random` instances, which would make this have roughly the same performance as a fully locked Random implementation. At least with the `StaticRandom` implementation you could make it `ThreadStatic` and avoid locking all together. – Yannick Motton Jan 10 '17 at 13:18
  • @YannickMotton I made an update based on your suggestions; I know you're right, I just felt it was a detail. After all, we kind-a expect the same from `Random`, so why would this behave differently? Anyways, this is more or less how I would solve it. Iirc it's faster than `ThreadStatic`, last time I benchmarked that, it wasn't that great... not that it matters, this isn't about performance. – atlaste Jan 10 '17 at 18:23
3

T[] and List<T> both share the same interface: IEnumerable<T>.

IEnumerable<T> however, does not have a Length or Count member, but there is an extension method Count(). Also there is no indexer on sequences, so you must use the ElementAt(int) extension method.

Something along the lines of:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    int endExclusive = source.Count();
    int randomIndex = Random.Range(0, endExclusive); 
    return source.ElementAt(randomIndex);
}
Yannick Motton
  • 34,761
  • 4
  • 39
  • 55
  • Upvote for good explanation, and for not using meaningless variable names like the OP. – ataravati Nov 26 '15 at 19:24
  • Hi ataravati, it always make me sad to see someone like yourself who is completely illiterate :) "ra" obviously means "array", it's maybe the most famous joke or funny name in programming. – Fattie Nov 26 '15 at 19:28
  • @JoeBlow, I know. "k" also means Count, and "r" means Random. – ataravati Nov 26 '15 at 19:29
  • Yan, you seemed to blow the lid off the heart of the question with `IEnumerable`, but are you now part of the consensus that feels `'IReadOnlyList` is the best approach to use in actual code in the real world? Sorry it's hard for me to follow the subtleties here. – Fattie Nov 26 '15 at 19:55
2

The answer is to use the original code!

This ought to be the only question on StackOverflow where the question itself illustrates significantly better code than any of the provided answers. All suggested answers encourage the use of an interface, which will imply a significant performance hit. Do not use those solutions in production code!

Given that the question is tagged unity3d it is apparent that he code will be part of a game. In a game, the last thing you want is intermittent stuttering due to garbage collection. Typically, in Unity, you want enumerators to be extremely performant. Which brings me to the answer itself:

Do NOT use interfaces for enumeration

Unless you really have to. The List<T> and T[] types have highly optimized value-typed enumerators. Once you cast your type to an interface, you will revert to the non-optimized reference-typed version. Every call to the non-optimized version of GetEnumerator() will produce garbage, adding up to the stuttering that will later take place (trust me) when the garbage collector collects those allocated objects.

  • Optimized version of List<T>.GetEnumerator() here.
  • Non-optimized version of IEnumerable<T>.GetEnumerator() here.

For details, see my other answer.

l33t
  • 18,692
  • 16
  • 103
  • 180
  • *"The answer is to use the original code*" - The "answer" to which question? Apparently not the OP, whish is for generalization of that code for container types other than array. Second, the generalized answer uses 2 simple virtual properties - `Count` and `this` (indexer), hence won't "imply a significant performance hit". Finally, there is no enumeration involved at all in this question, so the "answer itself", while correct in general, is answering a question not asked here. – Ivan Stoev Oct 21 '20 at 17:43
  • @IvanStoev , while your points are important in terms of "this QA", this basic information: ***Once you cast your type to an interface, you will revert to the non-optimized reference-typed version*** I find to be extremely valuable. – Fattie Oct 22 '20 at 13:10
0

You could change your definition a bit:

public static T AnyOne<T>(this IEnumerable<T> ra) 
{
    if(ra==null)
        throw new ArgumentNullException("ra");

    int k = ra.Count();
    int r = Random.Range(0,k);
    return ra.ElementAt(r-1);
}

Now you define an extension method for all the types that implement the IEnumerable<T> interface.

Jerry
  • 4,507
  • 9
  • 50
  • 79
Christos
  • 53,228
  • 8
  • 76
  • 108