15

I have the following code:

var foo = (from data in pivotedData.AsEnumerable()
                   select new
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   }).Distinct();

As expected this returns distinct values. However, what I want is to return a strongly-typed collection as opposed to an anonymous type, so when I do:

var foo = (from data in pivotedData.AsEnumerable()
                   select new BarObject
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   }).Distinct();

This does not return the distinct values, it returns them all. Is there a way to do this with actual objects?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Darren Young
  • 10,972
  • 36
  • 91
  • 150
  • 9
    Implement `Equals()` and `GetHashCode()` on your type. – dlev Sep 07 '11 at 15:16
  • @dlev what `GetHashCode` should do? – BrunoLM Sep 07 '11 at 15:20
  • @BrunoLM: Read for example this answer: http://stackoverflow.com/questions/6305324/why-use-gethashcode-over-equals/6305531#6305531 GetHashCode should deliver a hashcode over all fields that Equals also compares, and is used for hashtables or dictionaries for quick lookup of objects. – Philip Daubmeier Sep 07 '11 at 15:24
  • @Bruno Distinct will attempt to put each object into a hash table (and will return only those that do not already exist.) That means that hash code must be implemented properly to ensure that equal items have the same hash. Otherwise, `Equals()` (probably) won't be called, since the objects might hash to different buckets. – dlev Sep 07 '11 at 15:25

7 Answers7

12

For Distinct() (and many other LINQ features) to work, the class being compared (BarObject in your example) must implement implement Equals() and GetHashCode(), or alternatively provide a separate IEqualityComparer<T> as an argument to Distinct().

Many LINQ methods take advantage of GetHashCode() for performance because internally they will use things like a Set<T> to hold the unique items, which uses hashing for O(1) lookups. Also, GetHashCode() can quickly tell you if two objects may be equivalent and which ones are definitely not - as long as GetHashCode() is properly implemented of course.

So you should make all your classes you intend to compare in LINQ implement Equals() and GetHashCode() for completeness, or create a separate IEqualityComparer<T> implementation.

James Michael Hare
  • 37,767
  • 9
  • 73
  • 83
  • Thanks, this is what I have done. I take it that GetHashCode is more important if I am storing the objects in, for example, a dictionary? – Darren Young Sep 07 '11 at 15:41
  • @Darren: GetHashCode() is a very quick way of seeing if two objects MAY be equivalent. This is because any two equivalent objects should always have the same hash code. There's many LINQ methods that take advantage of the hash code for processing and internally use sets or dictionaries. Thus, when using LINQ, both are important. – James Michael Hare Sep 07 '11 at 15:44
  • 1
    @Darren: Just decompiled Distinct() and it indeed does use a Set internally which utilizes hashing. – James Michael Hare Sep 07 '11 at 15:46
  • 1
    @Darren: Look at the blog post, Gage has mentioned: http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx. ``Distinct`` *requires* ``GetHashCode`` to function correctly. – Philip Daubmeier Sep 07 '11 at 15:46
4

You need to override Equals and GetHashCode for BarObject because the EqualityComparer.Default<BarObject> is reference equality unless you have provided overrides of Equals and GetHashCode (this is what Enumerable.Distinct<BarObject>(this IEnumerable<BarObject> source) uses). Alternatively, you can pass in an IEqualityComparer<BarObject> to Enumerable.Distinct<BarObject>(this IEnumerable<BarObject>, IEqualityComparer<BarObject>).

jason
  • 236,483
  • 35
  • 423
  • 525
4

Either do as dlev suggested or use:

var foo = (from data in pivotedData.AsEnumerable()
               select new BarObject
               {
                 Group = data.Field<string>("Group_Number"),
                 Study = data.Field<string>("Study_Name")
               }).GroupBy(x=>x.Group).Select(x=>x.FirstOrDefault())

Check this out for more info http://blog.jordanterrell.com/post/LINQ-Distinct()-does-not-work-as-expected.aspx

Gage
  • 7,365
  • 9
  • 47
  • 77
  • Thats (in my opinion) not a very nice solution, as Distinct is much faster and is just designed to do what the op wants. The blog post, however, was interesting. Looks like I was right in assuming ``Distinct`` uses a ``HashSet<>`` internally. – Philip Daubmeier Sep 07 '11 at 15:43
3

Looks like Distinct can not compare your BarObject objects. Therefore it compares their references, which of course are all different from each other, even if they have the same contents.

So either you overwrite the Equals method, or you supply a custom EqualityComparer to Distinct. Remember to overwrite GetHashCode when you implement Equals, otherwise it will produce strange results if you put your objects for example into a dictionary or hashtable as key (e.g. HashSet<BarObject>). It might be (don't know exactly) that Distinct internally uses a hashset.

Here is a collection of good practices for GetHashCode.

Community
  • 1
  • 1
Philip Daubmeier
  • 14,584
  • 5
  • 41
  • 77
2

You want to use the other overload for Distinct() that takes a comparer. You can then implement your own IEqualityComparer<BarObject>.

i_am_jorf
  • 53,608
  • 15
  • 131
  • 222
1

Try this:

var foo = (from data in pivotedData.AsEnumerable().Distinct()
                   select new BarObject
                   {
                     Group = data.Field<string>("Group_Number"),
                     Study = data.Field<string>("Study_Name")
                   });
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
bevacqua
  • 47,502
  • 56
  • 171
  • 285
  • That **may** be possible, but it has not the same semantics. If there are other fields besides 'Group' and 'Study' some duplicates might get left over. – Philip Daubmeier Sep 07 '11 at 15:44
  • Yes, there are other fields. I ommitted them for brevity. I did try this approach initially. Thanks anyway. – Darren Young Sep 07 '11 at 15:46
-1

Should be as simple as:

var foo = (from data in pivotedData.AsEnumerable()
               select new
               {
                 Group = data.Field<string>("Group_Number"),
                 Study = data.Field<string>("Study_Name")
               }).Distinct().Select(x => new BarObject {
                 Group = x.Group,
                 Study = x.Study
               });
Thebigcheeze
  • 3,408
  • 2
  • 22
  • 18