0

I randomly sorted an IEnumerable. I keep printing out the same element, and getting a different result.

string[] collection = {"Zero", "One", "Two", "Three", "Four"};
var random = new Random();
var enumerableCollection = collection.OrderBy(e => random.NextDouble());

Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));

Every write gives a different random element. Why is the order not preserved?

See it on .NET Fiddle

mjwills
  • 23,389
  • 6
  • 40
  • 63
Evorlor
  • 7,263
  • 17
  • 70
  • 141
  • 2
    Why would you expect it to return the same value every time? – mjwills Nov 17 '18 at 07:02
  • 2
    Possible duplicate of [Is there a way to Memorize or Materialize an IEnumerable?](https://stackoverflow.com/questions/4621561/is-there-a-way-to-memorize-or-materialize-an-ienumerable) – mjwills Nov 17 '18 at 07:03
  • @mjwills For the same reason I would expect `collection[0]` to return the same value every time. – Evorlor Nov 17 '18 at 07:04
  • 8
    The key thing to understand is that LINQ queries do **nothing** until they are materialised (e.g. `ToList`, `ToArray`, `foreach`, `ElementAt`). You aren't executing the query once and then getting the first element. You are materialising it **five separate times**. So each time, it orders it differently - like you told it to. – mjwills Nov 17 '18 at 07:06
  • 4
    @Evorlor Read the [remarks section](https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.select?view=netframework-4.7.2): "This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach". Reusing that `IEnumerable` (e.g. by materialising it using `.ElementAt(0)`) will re-execute it. – ProgrammingLlama Nov 17 '18 at 07:07
  • 6
    As a learning exercise, change `random.NextDouble()` to `{ return random.NextDouble(); }` and put it on its own line of code. Put a breakpoint on it. Run the code. Watch how many times it gets hit. – mjwills Nov 17 '18 at 07:10
  • 1
    @Evorlor https://dotnetvibes.com/2016/01/04/deferred-execution-vs-immediate-execution-in-linq/ may be worth a read. – mjwills Nov 17 '18 at 07:18
  • 1
    Google "linq deferred execution" for the best hits. – Hans Passant Nov 17 '18 at 07:23

4 Answers4

6

Linq defers execution until it is absolutely necessary. You can think of enumerableCollection as the definition of how you want the enumeration to work, rather than the result of a single enumeration.

So each time you enumerate it (which you're doing when you call ElementAt) it will re-enumerate your original collection, and since you've chosen to order randomly, the answer is different each time.

You can make it do what you were expecting by adding .ToList() on the end:

var enumerableCollection = collection.OrderBy(e => random.NextDouble()).ToList();

This performs an enumeration, and stores the result in a List. By using this, the original list won't be re-enumerated each time you enumerate the enumerableCollection.

Richardissimo
  • 5,596
  • 2
  • 18
  • 36
4

The simple answer is that you are specifically forcing a non-deterministic situation. The why is due to how LINQ functions.

At its core LINQ is designed around the concept of building objects that represent operations on a set of data - object, records, etc. - and then executing those operations at enumeration time. When you call OrderBy the return is an object that will process the origin to perform the ordering, not a new collection that is ordered. The order is Only when you actually enumerate the data - by calling ElementAt in this case.

Each time you enumerate the IEnumerable<string> object it runs the sequence of operations you specified, creating a random ordering of the elements. Each ElementAt(0) call will therefore return the first element of a new random sequence.

Essentially what you appear to be misunderstanding is this: an IEnumerable<T> is not a collection.

You can however convert the IEnumerable<string> to a collection using ToArray() or ToList(). These will take the output of the operation sequence specified by the IEnumerable<> and create a new collection - string[] or List<string> respectively.

For instance:

string[] collection = {"Zero", "One", "Two", "Three", "Four"};
var random = new Random();
var enumerableCollection = collection.OrderBy(e => random.NextDouble()).ToArray();

Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));
Console.WriteLine(enumerableCollection.ElementAt(0));

Now you'll get a random item from the original array, but it will be the same item multiple times. Instead of randomizing each time it's just reading the first item in a new collection whose order is randomized.

Corey
  • 15,524
  • 2
  • 35
  • 68
2

Many of the previous responses are technically correct, but I think it is useful to look directly at the implementation for Enumerable.

We can see that the the ElementAt calls GetEnumerator, which in turn yields on an iteration of the the current grouping.

If you aren't familiar with yeild see yield.

For the given example, the current grouping is

e => random.NextDouble()

Which means that we are endlessly iteratorating over collection (look at the inifinite loop of the yield) and returning the resolution of the grouping for each element which executes against random.NextDouble().

Therefore, the non-deterministic nature here is due to the non-deterministic nature of the random grouping. This is expected behavior for the LINQ terminal statements (.List(), toArray() ect.) but can be confusing if you attempt to utilize them beforehand.

Alchemy
  • 445
  • 2
  • 3
  • 1
    This is correct but only half explained. Please break it down a bit more for the OP. Explain what enumerableCollection is referencing, break down the Enumerable interaface and explain the difference between Enumerable and Enumerator and why they are stored data but functions and types for iterating data. – Michael Puckett II Nov 17 '18 at 13:03
1

I am glad the question is here. While the concept of deferred execution is well known, people still make the mistake and enumerate IEnumerable multiple times, even Eric Lippert did not avoid that in https://ericlippert.com/2014/10/20/producing-combinations-part-three/ . Doing so may and eventually will break your code. If you need to enumerate IEnumerable multiple times, materialize it (for example with ToList).

Edit

with all respect to authority of Eric Lippert, let me show how bad multiple enumeration can be. Here is the correct result, based on list:

Zero,One,Two
Zero,One,Three
Zero,One,Four
Zero,Two,Three
Zero,Two,Four
Zero,Three,Four
One,Two,Three
One,Two,Four
One,Three,Four
Two,Three,Four

and here is what happens when we process sequence with unstable order:

Three,One,Four
Four,Two,One
One,Zero,Three
One,Zero,Four
Three,Two,Zero
One,Four,Zero
Two,Four,One
Two,Three,Four
Two,Three,Four
Four,Three,One

and to the points

  1. one of the enumerations was to count the sequence; a sequence that does not have a stable count when enumerated multiple times is a sequence that you should not attempt to make combinations out of!
  2. counting does not necessarily enumerate the sequence.
  3. in that series I strongly encourage the use of immutable data structures which are always safe and performant when enumerated multiple times.
  1. I agree that trying to make combinations of sequence with unstable count is questionable. But it is not the case here. The sequence has given count and guaranteed items, they are just randomly ordered. Imagine it can be result from some fancy HashSet that can rearrange its internal structure between enumerations for optimization reasons.
  2. That is true. That possibility is there. But is that really an argument?
  3. If that is a contract, ok. You expect sequence that you can enumerate as many times as you wish and always get the same result. Stating that explicitely would avoid confusion, not every sequence has this property.
Antonín Lejsek
  • 6,003
  • 2
  • 16
  • 18
  • Though you are correct that it is in general safer, more performant and more elegant to enumerate a sequence once, and you are correct that in my sample code I enumerated a sequence of items potentially twice, there are some mitigating factors. First, one of the enumerations was to count the sequence; a sequence that does not have a stable count when enumerated multiple times is a sequence that you should not attempt to make combinations out of! Second, counting does not necessarily enumerate the sequence. – Eric Lippert Nov 19 '18 at 21:38
  • And third, in that series I strongly encourage the use of immutable data structures which *are* always safe and performant when enumerated multiple times. – Eric Lippert Nov 19 '18 at 21:39