1

I'm running the following expression to get the delta between 2 collections based on the Id and also make sure the overall result includes only 'Active/Enabled' records:

//Returns the right records but all IsEnabled = false
var newRecords = allRecordsFromA.Where(a => !(allRecordsFromB.Any(b => ((b.Id == a.Id) && a.IsEnabled)))).ToList();

The above result returns all the right records in A that are not in B, but all of the records are marked IsEnabled = false. The really confusing part is I use the exact same expression on 2 other collections in another part of the application and it works. I get back all the proper records and they are all marked as .IsEnabled == true That makes no sense to me. I even copied over and it doesn't work.

However, if I change to a 2 step process I get the results I desire:

//Works
var newRecords = allRecordsFromA.Where(a => !(allRecordsFromB.Any(b => (b.Id == a.Id)))).ToList();
var results = newRecords.Where(x => x.IsEnabled).ToList();

I was thinking the order of operations in the 1st expression is wrong returning the records with the ! being applied to IsEnabled but I don't see it. What is incorrect in my 1st expression that is returning records with IsEnabled == false?

atconway
  • 20,624
  • 30
  • 159
  • 229

3 Answers3

3

I think that your parentheses are not where you think they are. Specifically, a.IsEnabled is inside the Any statement, which is then negated.

allRecordsFromB.Any(b => ((b.Id == a.Id) && a.IsEnabled))

This gives you very different logic from your second example. I'd move a.IsEnabled to the start of your Where and remove some unnecessary parentheses to clarify things.

var newRecords = allRecordsFromA.Where(a => a.IsEnabled &&
                       !allRecordsFromB.Any(b => b.Id == a.Id)).ToList();

Note that if you have large data sets, or think the following approach would be clearer, you can speed things up by making a hash set of the b.Ids. (or via some sort of join, which internally would do the same sort of thing)

var bIds = new HashSet<int>(allRecordsFromB.Select(b => b.Id));
var newRecords = allRecordsFromA.Where(a => a.IsEnabled &&
                                            !bIds.Contains(a.Id)).ToList();
Tim S.
  • 55,448
  • 7
  • 96
  • 122
  • Very solid answer and very helpful - exactly what I was looking to be explained, thank you. Can you explain why a `HashSet` is better performing and what it does? Is it because I'm just working with a set of Ids and not full blow collections with properties that aren't a part of the equation and just bloating things? Sometimes I am working with large sets so I might actually do this. – atconway May 10 '14 at 19:11
  • After plugging in your expression ReSharper offered up to be what it thought was even a simplified expression (however I know you did yours to do a straight comparison to my OP) - this is just a FYI: `var newRecords = allRecordsFromA.Where(a => a.IsEnabled && allRecordsFromB.All(b => b.Id != a.Id)).ToList();` Any comment on if you would do it this way? – atconway May 10 '14 at 19:16
  • 1
    @atconway To explain a HashSet: Imagine if you print out a sorted list of IDs instead of one that's not ordered in any particular way. Instead of needing to search every item in the list, you can find whether a number is on the list very quickly by looking for the area that's near it. That's like doing a `Contains` in a `HashSet` vs one in a list. A more detailed/acccurate description can be [found here](http://stackoverflow.com/a/4558802/781792). – Tim S. May 10 '14 at 19:52
  • 1
    And for the proposed refactoring: IMO, the original is better in this case. Checking that the IDs are equal is a common idiom, and seeing if `Any` IDs are equal is easy to understand as a `Contains`-like operation, so to me it's easier to understand `!Any(Id == Id)` than `All(Id != Id)`. In most other cases, I'd agree with ReSharper: `All(..)` is better than `!Any(..)` – Tim S. May 10 '14 at 19:55
  • eh - it got confusing with `All()` when doing the inverse to find a 'matching` set. This `allRecordsFromB.Any(b => b.Id == a.Id))` worked and returned matches. This `allRecordsFromB.All(b => b.Id == a.Id))` did not work and returned '0'. I'm trying to understand: _They (All and Any) are pretty much the same but for the obvious inversion of returning true on predicate match versus returning false on predicate mismatch._ How would I use the `All` extension method to get back matching records? – atconway May 10 '14 at 21:04
  • Like ReSharper suggested, `allRecordsFromB.All(b => b.Id != a.Id)` should work. In general, `!Any(x => p)` should be equivalent to `All(x => !p)` (where `p` is `i == j`, `!p` could be written as `i != j`) – Tim S. May 10 '14 at 21:13
2

The check whether a is enabled is inside the Any, which is Negated. So if both the conditions b.Id == a.Id (for some b in allRecordsFromB) and a.IsEnabled are true, the element will not be part of the result. If one of the two is false, it will be part of the result.

So since that is the opposite of what you want, you should move the IsEnabled check outside of the Any.

sepp2k
  • 363,768
  • 54
  • 674
  • 675
1

I think part of the problem is that your Linq statement is hard to read and that kind of defeats the purpose. You may want to consider something like this:

var collectionA = new List<Record>();
var collectionB = new List<Record>();
var results = collectionA.Intersect(collectionB).Where(x => x.IsEnabled);

You have to implement the Equals method:

public class Record
{
    public int Id { get; set; }
    public bool IsEnabled { get; set; }

    protected bool Equals(Record other)
    {
        return Id == other.Id;
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        if (ReferenceEquals(this, obj)) return true;
        if (obj.GetType() != this.GetType()) return false;
        return Equals((Record) obj);
    }

    public override int GetHashCode()
    {
        return Id;
    }
}

Or create a Comparer and pass that to the Intersect method:

public class RecordComparer : IEqualityComparer<Record>
{
    public bool Equals(Record x, Record y)
    {
        return x.Id == y.Id;
    }

    public int GetHashCode(Record obj)
    {
        return obj.GetHashCode();
    }
}
Jason Nesbitt
  • 730
  • 1
  • 6
  • 15
  • I've see this implementation but rather using `Except` and passing in the `IEqualityComparer` implementation. What is the difference by using `Intersect` instead? – atconway May 10 '14 at 19:30
  • `Except` may be the right method for you now that I think about it. My point was just that methods for Set comparison exist and I think using them makes your code easy to understand. – Jason Nesbitt May 10 '14 at 19:35