3

I've got a collection of items (ADO.NET Entity Framework), and need to return a subset as search results based on a couple different criteria. Unfortunately, the criteria overlap in such a way that I can't just take the collection Where the criteria are met (or drop Where the criteria are not met), since this would leave out or duplicate valid items that should be returned.

I decided I would do each check individually, and combine the results. I considered using AddRange, but that would result in duplicates in the results list (and my understanding is it would enumerate the collection every time - am I correct/mistaken here?). I realized Union does not insert duplicates, and defers enumeration until necessary (again, is this understanding correct?).

The search is written as follows:

IEnumerable<MyClass> Results = Enumerable.Empty<MyClass>();
IEnumerable<MyClass> Potential = db.MyClasses.Where(x => x.Y); //Precondition

int parsed_key;

//For each searchable value
foreach(var selected in SelectedValues1)
{
    IEnumerable<MyClass> matched = Potential.Where(x => x.Value1 == selected);
    Results = Results.Union(matched); //This is where the problem is
}

//Ellipsed....

foreach(var selected in SelectedValuesN) //Happens to be integer
{
    if(!int.TryParse(selected, out parsed_id))
        continue;
    IEnumerable<MyClass> matched = Potential.Where(x => x.ValueN == parsed_id);
    Results = Results.Union(matched); //This is where the problem is
}

It seems, however, that Results = Results.Union(matched) is working more like Results = matched. I've stepped through with some test data and a test search. The search asks for results where the first field is -1, 0, 1, or 3. This should return 4 results (two 0s, a 1 and a 3). The first iteration of the loops works as expected, with Results still being empty. The second iteration also works as expected, with Results containing two items. After the third iteration, however, Results contains only one item.

Have I just misunderstood how .Union works, or is there something else going on here?

yoozer8
  • 7,361
  • 7
  • 58
  • 93

5 Answers5

8

Because of deferred execution, by the time you eventually consume Results, it is the union of many Where queries all of which are based on the last value of selected.

So you have

Results = Potential.Where(selected)
    .Union(Potential.Where(selected))
    .Union(potential.Where(selected))...

and all the selected values are the same.

You need to create a var currentSelected = selected inside your loop and pass that to the query. That way each value of selected will be captured individually and you won't have this problem.

Rawling
  • 49,248
  • 7
  • 89
  • 127
  • Yes, `selected` is captured. An alternate solution is a ToList() inside that loop. – H H Aug 13 '12 at 14:10
  • @Henk Yeah, I was going to go for "make Results a list" but just calling `ToList` is all that's really needed. – Rawling Aug 13 '12 at 14:12
  • This worked! Several of the fields being searched against are integers. For those I created `int d_id` or similar right before the loop, then `TryParsed` the search term inside the loop and searched against that. Didn't realize the deferred execution would use the *current value of the search parameter* rather than what it was when I unioned it. – yoozer8 Aug 13 '12 at 14:16
  • 2
    I think he's not using .ToList because he wants to defer execution. – Jon Senchyna Aug 13 '12 at 14:16
  • Correct, @JonSenchyna. No need to execute on each and every value. – yoozer8 Aug 13 '12 at 14:18
  • @Jim At least in that case, it's obvious that all the values are going into the same `d_id` variable, even if it's not obvious how the queries will be affected. With a standard `foreach` variable it's not so obvious, and I believe in earlier versions of C# it was actually the other way around and you did get one separate variables per iteration – Rawling Aug 13 '12 at 14:18
  • @Rawling No, in previous versions you never did get different variables for each loop of a `foreach`, but you will as of C# 5.0. It's just that before C# 2.0 there were no language semantics for closures so you couldn't run into this problem. – Servy Aug 13 '12 at 14:28
  • 1
    @Servy Ah, fair enough. Good to know they're "fixing" it :) – Rawling Aug 13 '12 at 14:30
  • 3
    @Rawling Well, it's not a bug, it's just a common mistake. I'll let Eric Lippert [explain](http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx) in his [own words](http://blogs.msdn.com/b/ericlippert/archive/2009/11/16/closing-over-the-loop-variable-part-two.aspx). – Servy Aug 13 '12 at 14:36
1

You can do this much more simply:

Reuslts = SelectedValues.SelectMany(s => Potential.Where(x => x.Value == s));

(this may return duplicates)

Or

Results = Potential.Where(x => SelectedValues.Contains(x.Value));
SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • This would simplify the search for each searchable value, but several different values need to be checked. I'm not sure if it's clear in the question (doesn't look it now that I re-read it), but that for loop is repeated a few times for differnt values. Editing to clarify... – yoozer8 Aug 13 '12 at 14:07
  • @Jim: You can do that with `||` clauses. – SLaks Aug 13 '12 at 14:39
1

As pointed out by others, your LINQ expression is a closure. This means your variable selected is captured by the LINQ expression in each iteration of your foreach-loop. The same variable is used in each iteration of the foreach, so it will end up having whatever the last value was. To get around this, you will need to declare a local variable within the foreach-loop, like so:

//For each searchable value 
foreach(var selected in SelectedValues1) 
{
    var localSelected = selected;
    Results = Results.Union(Potential.Where(x => x.Value1 == localSelected));
}

It is much shorter to just use .Contains():

Results = Results.Union(Potential.Where(x => SelectedValues1.Contains(x.Value1)));

Since you need to query multiple SelectedValues collections, you could put them all inside their own collection and iterate over that as well, although you'd need some way of matching the correct field/property on your objects.

You could possibly do this by storing your lists of selected values in a Dictionary with the name of the field/property as the key. You would use Reflection to look up the correct field and perform your check. You could then shorten the code to the following:

// Store each of your searchable lists here
Dictionary<string, IEnumerable<MyClass>> DictionaryOfSelectedValues = ...;

Type t = typeof(MyType);
// For each list of searchable values
foreach(var selectedValues in DictionaryOfSelectedValues) // Returns KeyValuePair<TKey, TValue>
{
    // Try to get a property for this key
    PropertyInfo prop = t.GetProperty(selectedValues.Key);
    IEnumerable<MyClass> localSelected = selectedValues.Value;

    if( prop != null )
    {
        Results = Results.Union(Potential.Where(x =>
                localSelected.Contains(prop.GetValue(x, null))));
    }
    else // If it's not a property, check if the entry is for a field
    {
        FieldInfo field = t.GetField(selectedValues.Key);
        if( field != null )
        {
            Results = Results.Union(Potential.Where(x =>
                    localSelected.Contains(field.GetValue(x, null))));
        }
    }
}
Community
  • 1
  • 1
Jon Senchyna
  • 7,867
  • 2
  • 26
  • 46
0

No, your use of union is absoloutely correct. The only thing to keep in mind is it excludes duplicates as based on the equality operator. Do you have sample data?

Adam
  • 15,537
  • 2
  • 42
  • 63
M Afifi
  • 4,645
  • 2
  • 28
  • 48
0

Okay, I think you are are haveing a problem because Union uses deferred execution.

What happens if you do,

var unionResults = Results.Union(matched).ToList();
Results = unionResults; 
Jodrell
  • 34,946
  • 5
  • 87
  • 124