3

So in things like GroupBy() in Linq, you can provide an implementation of IEqualityComparer<T> to help with object comparisons. It seems, though, that it would be easier to simply pass in a lambda expression.

Example:

// current implementation
myCollection.GroupBy(c => c.Foo, c => c.Bar, new FooBarComparer());

// it seems easier to...
myCollection.GroupBy(c => c.Foo, c => c.Bar, (x, y) => x.Baz == y.Baz);

Given a simple implementation of IEqualityComparer<T> like this:

public class FooBarComparer : IEqualityComparer<FooBar> {
    public bool Equals(FooBar x, FooBar y) {
        return x.Baz == y.Baz;
    }

    public int GetHashCode(FooBar obj) {
        return obj.GetHashCode();
    }
}

It seems that providing a lambda expression could be just as effective. As it stands now, if I try to pass in an IEqualityComparer<T> with a Linq query to a database, it fails because SQL Server (or whatever) doesn't know anything about my class. It seems that a lambda would be able to be translated into SQL that can be used in the target database.

Is there a specific reason this is not provided as an option in Linq?

Stephen Collins
  • 3,523
  • 8
  • 40
  • 61
  • 5
    A lambda expression doesn't support GetHashCode, which is used internally for performance. – Michael Liu May 04 '15 at 19:40
  • You'd have to have a second lambda for generating a hash code, but since determining equality and hash codes should be related it makes more sense for them to be encapsulated in the same class. – juharr May 04 '15 at 19:42
  • `GetHashCode()` is used in preference to `Equals()`? I was thinking that an overload for a lambda would use that lambda for the comparison as if it were `Equals()`. – Stephen Collins May 04 '15 at 19:43
  • 1
    And if you want, you can directly group by `c.Foo.Baz`, – xanatos May 04 '15 at 19:43
  • @StephenCollins How do you expect to properly compare reference types without GetHashCode()? – David L May 04 '15 at 19:44
  • @DavidL It would be done by the SQL server for example, how it becomes the SQL server problem, that surely wouldn't use a C# GetHashCode... – xanatos May 04 '15 at 19:46
  • This question opens up the "what use are the `Queryable` method overloads that support a `IEqualityComparer<>`", like `Distinct`, `Contains`, `GroupBy`... – xanatos May 04 '15 at 19:47
  • @xanatos That was really my thought. I've always looked at the `IEqualityComparer<>` overloads as a good indicator NOT to use them with Linq 2 SQL statements since to me it implies that you ARE comparing equality in C#, not in SQL. – David L May 04 '15 at 19:48
  • @DavidL Probably these overloads are present so that you can use "stock" comparers... For example `StringComparer.OrdinalIgnoreCase`. Then a LINQ provider could recognize a certain (fixed, small) number of them (the `StringComparer` family, for example, plus clearly the `EqualityComparer.Default` that uses the default comparer) – xanatos May 04 '15 at 19:55
  • 4
    I'm voting to close this question as off-topic because it requires reading the minds of the developers of the feature. – Servy May 04 '15 at 20:05
  • Your `FooBarComparer` doesn't have a cohesive sense of identity between it's equality and it's hash code (the hash is based on the item, the equality based on one property of it), and as a result any operations using it are going to have an undefined result. – Servy May 04 '15 at 20:09

2 Answers2

0

You'd need two lambdas so that GetHashCode has an equivalent as well. Besides that this would work, yes. There are some LINQ methods that do not make use of hash codes but that use equality (Enumerable.Contains).

I guess it's just to have a standard API for equality that the whole BCL uses. You can easily convert between delegate and comparer by using a delegate-backed comparer implementation or by converting myComparer.Equals to a delegate.

For remoting expressions to the database is is not easy to remote a comparer expression. GROUP BY does not support that in SQL. It can surely be made to work but it is a niche use case (actually if the comparer expression for a GroupBy does not provide an equality relation I'm not sure how that would turn out when translated to SQL).

usr
  • 168,620
  • 35
  • 240
  • 369
0

To make an efficient GroupBy/Distinct you need two things:

  • An equality comparer
  • An hash generator, to create an hash dictionary

OR you can follow the C++ route

  • A comparator able to order the elements, so that you are able to create a tree

If you only have an equality comparer, then the difficulty of doing the GroupBy is something like O(n^2), because if you have 5 elements, you need 5 + 4 + 3 + 2 + 1 comparisons, so n * (n + 1) / 2 so 15. This is something a "good" library wouldn't ever permit you to do (and no sane SQL server would ever do!)

Now, clearly the LINQ library could analyze your equality lambda, see that it is

(x, y) => x.Baz == y.Baz

see that it is symmetrical, so that the left term and the right term are in the form

x => x.Baz

and use this to generate an hasher and select a comparer. But at this point, wouldn't it be easier to do directly

myCollection.GroupBy(c => c.Foo.Baz) 

Yes, that you can do :-)

And then,

myCollection.GroupBy(c => c.Foo.Baz, c => new { c.Foo, c.Bar })
            .Select(c => new { Key = c.First().Foo, Values = c.Select(x => x.Bar) })

That is quite similar to your intended GroupBy (the only difference is that the values are in a Values IEnumerable<>)

Now... for the use of the overloads with IEqualityComparer<T>... as I've written in the comments, I do think that they should be used with "stock" comparers, that the LINQ provider can recognize, like the various StringComparer.* (e.g. StringComparer.OrdinalIgnoreCase) and the EqualityComparer<T>.Default, that represents the "default" comparer.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • Both C# and C++ support equality comparers and ordering comparers. They're not specific to one language or another. In C# this would be `IComparer` and `IEqualityComparer`. I'm not sure what they each map to in C++, but I'm confident both exist in some form. – Servy May 04 '15 at 20:07
  • @Servy If you look at `map`/`multimap` of C++ you'll see that it is based on the `operator<` (or another ordering operator) (see for example http://stackoverflow.com/questions/6573225/what-requirements-must-stdmap-key-classes-meet-to-be-valid-keys). This because C++ doesn't have a "primitive" concept of hash code like C# – xanatos May 04 '15 at 20:09
  • The fact that you've listed two non-hash based data structures doesn't mean that C++ doesn't have any hash based data structures. That's like saying that because `SortedDictionary` isn't hash based C# doesn't use hash based data structures. Both languages support both (even if each language makes a different one "easier"). – Servy May 04 '15 at 20:11
  • 1
    @Servy For many years C++ didn't have the `unordered_map`. It has been introduced only in C++11. So when I say "OR you can follow the C++ route" I am right: for many many years the C++ route was "we don't have hash functions, we use comparators". In recent years it has changed to "we have both" – xanatos May 04 '15 at 20:13