3

Given the following code:

var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
                            .Select(s => outValue);
outValue = 3;
//enumerating over someEnumerable here shows ints from 0 to 99

I am able to see a "snapshot" of the out parameter for each iteration. Why does this work correctly instead of me seeing 100 3's (deferred execution) or 100 99's (access to modified closure)?

moarboilerplate
  • 1,633
  • 9
  • 23
  • Because the body of the where clause is being executed lazily on each iteration and updating the value of `outValue`. If you eagerly evaluated the sequence you would see `outValue` equal to the last element. – Lee Oct 30 '14 at 20:14
  • `Select` is enumerating the result of the `Where`, and yielding each result one-by-one. – Blorgbeard Oct 30 '14 at 20:15
  • I prefer an extension method like [this](http://stackoverflow.com/a/16613455/284240) instead of relying on an undocumented behaviour. Btw, an alternative might be `Where(s=>s.All(Char.IsDigit)).Select(int.Parse)`. – Tim Schmelter Oct 30 '14 at 20:16
  • @TimSchmelter It's not undocumented, it's just fragile. Everything here is documented, it's just that a few seemingly valid refactors would break the code. – Servy Oct 30 '14 at 20:18
  • @TimSchmelter `Enumerable.All(predicate)` returns true for a zero-length collection. Can't assume you'll be able to `int.Parse()` that. – Federico Berasategui Oct 30 '14 at 20:19
  • @HighCore: ..and throws an exception if the string is `null`. You can first use `String.IsNullOrEmpty` – Tim Schmelter Oct 30 '14 at 20:22
  • @Servy: where is it documented? Even if i'll work in C#10 you can't be sure that it works for any LINQ provider. Sometimes there are [breaking changes](http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx). – Tim Schmelter Oct 30 '14 at 20:25
  • @TimSchmelter It's documented right in the documentation of `Where`. `This method is implemented by using deferred execution. The immediate return value is an object that stores all the information that is required to perform the action. The query represented by this method is not executed until the object is enumerated either by calling its GetEnumerator method directly or by using foreach in Visual C# or For Each in Visual Basic.` (`Select` says the same thing too.) The fact that execution is deferred is in fact documented. What other behavior are you asserted *isn't* documented? – Servy Oct 30 '14 at 20:33
  • 2
    @TimSchmelter If you want to not do something because C# might make a breaking change of documented behavior then you'd never be able to do anything ever, because a breaking change might be made to break it. – Servy Oct 30 '14 at 20:37

3 Answers3

5

First you define a query, strings that knows how to generate a sequence of strings, when queried. Each time a value is asked for it will generate a new number and convert it to a string.

Then you declare a variable, outValue, and assign 0 to it.

Then you define a new query, someEnumerable, that knows how to, when asked for a value, get the next value from the query strings, try to parse the value and, if the value can be parsed, yields the value of outValue. Once again, we have defined a query that can do this, we have not actually done any of this.

You then set outValue to 3.

Then you ask someEnumerable for it's first value, you are asking the implementation of Select for its value. To compute that value it will ask the Where for its first value. The Where will ask strings. (We'll skip a few steps now.) The Where will get a 0. It will call the predicate on 0, specifically calling int.TryParse. A side effect of this is that outValue will be set to 0. TryParse returns true, so the item is yielded. Select then maps that value (the string 0) into a new value using its selector. The selector ignores the value and yields the value of outValue at that point in time, which is 0. Our foreach loop now does whatever with 0.

Now we ask someEnumerable for its second value, on the next iteration of the loop. It asks Select for a value, Select asks Where,Where asks strings, strings yields "1", Where calls the predicate, setting outValue to 1 as a side effect, Select yields the current value of outValue, which is 1. The foreach loop now does whatever with 1.

So the key point here is that due to the way in which Where and Select defer execution, performing their work only immediately when the values are needed, the side effect of the Where predicate ends up being called immediately before each projection in the Select. If you didn't defer execution, and instead performed all of the TryParse calls before any of the projections in Select, then you would see 100 for each value. We can actually simulate this easily enough. We can materialize the results of the Where into a collection, and then see the results of the Select be 100 repeated over and over:

var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
    .ToList()//eagerly evaluate the query up to this point
    .Select(s => outValue);

Having said all of that, the query that you have is not particularly good design. Whenever possible you should avoid queries that have side effects (such as your Where). The fact that the query both causes side effects, and observes the side effects that it creates, makes following all of this rather hard. The preferable design would be to rely on purely functional methods that aren't causing side effects. In this context the simplest way to do that is to create a method that tries to parse a string and returns an int?:

public static int? TryParse(string rawValue)
{
    int output;
    if (int.TryParse(rawValue, out output))
        return output;
    else
        return null;
}

This allows us to write:

var someEnumerable = from s in strings
    let n = TryParse(s)
    where n != null
    select n.Value;

Here there are no observable side effects in the query, nor is the query observing any external side effects. It makes the whole query far easier to reason about.

Servy
  • 202,030
  • 26
  • 332
  • 449
  • Thanks for showing how to break it--helps to emphasize that the query is brittle. By the way, is there a pure LINQ way to both filter and project results in the same expression/clause instead of returning a value that the second where clause has to know in order to filter by? For example if I do a where first using the TryParse predicate I have to reevaluate the TryParse in a call to Select() next. If I use let in query syntax, then we reintroduce the closure problem. – moarboilerplate Oct 30 '14 at 20:35
  • @moarboilerplate Sure, I showed an example of how to do this in my post. – Servy Oct 30 '14 at 20:36
  • I should clarify--the select in the example returns a value that the next clause has to know about so you project all elements 1:1 first and then the where clause kicks in and filtering happens. I was wondering if there is a linq operator/keyword that will only yield a projection if a predicate evaluates to true. – moarboilerplate Oct 30 '14 at 20:42
  • 2
    @moarboilerplate Is it *possible*, yes. Would it be good design, no. If you want to both project and filter then you should use a filtering operator and a projection operator, rather than trying to do both at once. Doing so would be moving back into the realm of code that works but is poor design. – Servy Oct 30 '14 at 20:44
  • Digging up an old thread, isn't the problem with this that the order of the functions is wrong? I feel like I could probably have solved my issue if I made a method I passed the collection into, and then inside it had a foreach where it goes through and calls TryParse on each item and yield return/yield break as needed. That way the selection and filtering is done in one and the output parameter doesn't exist outside of the closure. – moarboilerplate Jan 09 '15 at 19:30
1

Because when you enumerate the value changes one at a time and changes the value of the variable on the fly. Due to the nature of LINQ the select for the first iteration is executed before the where for the second iteration. Basically this variable turns into a foreach loop variable of a kind.

This is what deferred execution buys us. Previous methods do not have to execute fully before the next method in the chain starts. One value moves through all the methods before the second goes in. This is very useful with methods like First or Take which stop the iteration early. Exceptions to the rule are methods that need to aggregate or sort like OrderBy (they need to look at all elements before finding out which is first). If you add an OrderBy before the Select the behavior will probably break.

Of course I wouldn't depend on this behavior in production code.

Stilgar
  • 22,354
  • 14
  • 64
  • 101
0

I don't understand what is odd for you?

If you write a loop on this enumerable like this

foreach (var i in someEnumerable)
{
     Console.WriteLine(outValue);
}

Because LINQ enumerates each where and select lazyly and yield each value, if you add ToArray

var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
                        .Select(s => outValue).ToArray();

Than in the loop you will see 99 s

Edit

The below code will print 99 s

var strings = Enumerable.Range(0, 100).Select(i => i.ToString());
int outValue = 0;
var someEnumerable = strings.Where(s => int.TryParse(s, out outValue))
                                    .Select(s => outValue).ToArray();
//outValue = 3;


foreach (var i in someEnumerable)
{
    Console.WriteLine(outValue);
}
Arsen Mkrtchyan
  • 49,896
  • 32
  • 148
  • 184
  • Adding the `ToArray` there wouldn't change the results. – Servy Oct 30 '14 at 20:17
  • Not sure if you understand me correctly, but the above two codes are different Servy... I don't mean ToArray will change the workflow, but the result will be different since sequence will be enumerated at once – Arsen Mkrtchyan Oct 30 '14 at 20:23
  • Your code only prints out `99` because your print line is printing out `outValue` and not `i`. Of course `outValue` is going to be `99` after you've executed the entire query, but the actual values in the list go from 0 to 99. What you've done is effectively written:`strings.Where(s => int.TryParse(s, out outValue)).Select(s => outValue).ToArray().Select(s => outValue);` which obviously prints out `99`. – Servy Oct 30 '14 at 20:39
  • agree, no one argue that actual values go from 0 to 99... well, the question is a bit messy, may be we understood it differently – Arsen Mkrtchyan Oct 30 '14 at 20:40
  • You do right here in your answer, `Than in the loop you will see 99 s`. You specifically stated that adding `ToArray` to the end will change the query so that it will yield `99` repeatedly. It won't. You then changed the code in your second snippet to something entirely different in order to replicate the results you predicted. – Servy Oct 30 '14 at 20:41
  • well, the question is a bit messy, may be we understood it differently... I thought that the question is about outValue – Arsen Mkrtchyan Oct 30 '14 at 20:42
  • Of course none of this is even bringing up the fact that your answer doesn't actually *explain* anything. You show how to break the code, but you don't explain why the OP's code functions the way it does. – Servy Oct 30 '14 at 20:42