7

This code

using System;
using System.Collections.Generic;
using System.Linq;

namespace ConsoleApplication
{
    internal class Program
    {
        public static void Main()
        {
            var values = new[] {1, 2, 3, 3, 2, 1, 4};
            var distinctValues = GetDistinctValuesUsingWhere(values);
            Console.WriteLine("GetDistinctValuesUsingWhere No1: " + string.Join(",", distinctValues));
            Console.WriteLine("GetDistinctValuesUsingWhere No2: " + string.Join(",", distinctValues));
            distinctValues = GetDistinctValuesUsingForEach(values);
            Console.WriteLine("GetDistinctValuesUsingForEach No1: " + string.Join(",", distinctValues));
            Console.WriteLine("GetDistinctValuesUsingForEach No2: " + string.Join(",", distinctValues));
            Console.ReadLine();
        }

        private static IEnumerable<T> GetDistinctValuesUsingWhere<T>(IEnumerable<T> items)
        {
            var set=new HashSet<T>();
            return items.Where(i=> set.Add(i));
        }

        private static IEnumerable<T> GetDistinctValuesUsingForEach<T>(IEnumerable<T> items)
        {
            var set=new HashSet<T>();
            foreach (var i in items)
            {
                if (set.Add(i))
                    yield return i;
            }
        }
    }
}

results in the following output:

GetDistinctValuesUsingWhere No1: 1,2,3,4

GetDistinctValuesUsingWhere No2:

GetDistinctValuesUsingForEach No1: 1,2,3,4

GetDistinctValuesUsingForEach No2: 1,2,3,4

I don't understand why I don't get any values in the row "GetDistinctValuesUsingWhere No2".

Can anyone explain this to me?

UPDATE after the answer from Scott, I changed the example to the following:

       private static IEnumerable<T> GetDistinctValuesUsingWhere2<T>(IEnumerable<T> items)
    {
        var set = new HashSet<T>();
        var capturedVariables = new CapturedVariables<T> {set = set};

        foreach (var i in items)
            if (capturedVariables.set.Add(i))
                yield return i;
        //return Where2(items, capturedVariables);
    }

    private static IEnumerable<T> Where2<T>(IEnumerable<T> source, CapturedVariables<T> variables)
    {
        foreach (var i in source)
            if (variables.set.Add(i))
                yield return i;
    }

    private class CapturedVariables<T>
    {
        public HashSet<T> set;
    }

This will result in two times the output 1,2,3,4.

However, if I just uncomment the line

return Where2(items, capturedVariables);

and comment the lines

foreach (var i in items) if (capturedVariables.set.Add(i)) yield return i;

in the method GetDistinctValuesUsingWhere2, I will get the output 1,2,3,4 only once. This is altough the deleted lines and the now-uncommented method are exactly the same.

I still don't get it....

neural5torm
  • 773
  • 1
  • 9
  • 21
Urs Meili
  • 618
  • 7
  • 19
  • Off-topic, but why not just use `Distinct` (or for objects with properties, MoreLINQ's `DistinctBy`) instead of rolling your own? Or is this merely a learning exercise? – Kenneth K. Aug 25 '17 at 18:35
  • Try adding a `.ToList()` at the end of your Where, you should get correct results. The Issue is that you are returning the Expression, so it will be reevaluated every time you request the object. – Kolichikov Aug 25 '17 at 18:35
  • Possible duplicate: [How to tell if an IEnumerable is subject to deferred execution ?](https://stackoverflow.com/questions/1168944/how-to-tell-if-an-ienumerablet-is-subject-to-deferred-execution) – Igor Aug 25 '17 at 18:38
  • 1
    of course I would use Distinct in real-life. This is just a simplified example of a more complex problem we recently stumbled upon. – Urs Meili Aug 25 '17 at 19:10
  • Re your last edit: Are you familiar with the concept of [closures](https://stackoverflow.com/q/595482/11683)? – GSerg Aug 25 '17 at 19:54
  • @GSerg I tought so, yes. But since my 2nd example doesn't use any delegates or lambdas, I was thinking that this concept doesn't apply here. – Urs Meili Aug 25 '17 at 20:30

2 Answers2

8

The reason GetDistinctValuesUsingWhere No2 does not return any results is because of variable capture.

Your where method is more like this function

    private static IEnumerable<T> GetDistinctValuesUsingWhere<T>(IEnumerable<T> items)
    {
        var set=new HashSet<T>();
        var capturedVariables = new CapturedVariables {set = set}
        return Where(items, capturedVariables);
    }

    IEnumerable<T> Where(IEnumerable<T> source, CapturedVariables variables)
    {
        foreach (var i in items)
        {
            if (variables.set.Add(i))
                yield return i;
        }

    }

So both methods are yield return under the hood, but the GetDistinctValuesUsingWhere reuses the hashset for each invocation where the GetDistinctValuesUsingForEach generates a new hashset each enumeration.

Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • The problem is, I don't see why or where exactly the CLR decides that it needs to keep the captured variables alive. If you just inline the code in your "Where" method, the variables won't be captured, and the example works as expected on multiple calls. (see my updated example). – Urs Meili Aug 25 '17 at 20:43
  • 1
    @umei a method with a `yield return` statement will "stay alive" at least until the end of the enumeration (the whole method will stay alive). Or in other words, the enumerable represents the whole method (the whole method is converted to a state machine). For the `foreach` version, the lambda expression captures the reference to the `HashSet`, and the enumerable only represents the application of that lambda with `Where`, not the instantiation of the `HashSet`. You would need to invoke `GetDistinctValuesUsingWhere` a second time to get a new `HashSet`. – Dave Cousineau Aug 25 '17 at 21:11
3

Answering the updated version:

  • In the case of the GetDistinctValuesUsingWhere2() method containing the foreach loop, the returned IEnumerable captured the whole contents of the method in a closure, including the set initialization statement. This statement is thus executed each time you start iterating the enumerable, but not during the original call to GetDistinctValuesUsingWhere2().
  • In the case of the other variant, where you return Where2(), the GetDistinctValuesUsingWhere2() method does not need to capture the contents of the method because you did not define an iterator or a delegate in it. Instead, you return Where2() as the IEnumerable. The latter method only captures the foreach loop and its parameters (already initialized), but not the set initialization statement itself. Thus this time, the set initialization statement will only be executed once, during the original call to GetDistinctValuesUsingWhere2().

If necessary, put some breakpoints at various points in your code: this should help you understand what I tried to explain here.

neural5torm
  • 773
  • 1
  • 9
  • 21