0

Is this linq statement correct for getting unique values of 'Name' from my newSpecs collection?

distinctSpecialties is showing duplicates.

IEnumerable<SpecialtyData> distinctSpecialties = newSpecs
    .Select(s => s).Distinct()
    .OrderBy(s => s.Name);
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Tsukasa
  • 6,342
  • 16
  • 64
  • 96

7 Answers7

4

You need to define "distinct" - by default reference types are distinct if they are separate instances (ReferenceEquals). You can override the default by either:

  • Implementing IEquatable<SpecialtyData> in SpecialtyData
  • Overriding Equals and GetHashCode in SpecialtyData
  • defining an IEqualityComparer<SpecialtyData> and passing that to Distinct

A simpler way to get the "first" distinct item by name is to use GroupBy:

IEnumerable<SpecialtyData> distinctSpecialties = newSpecs.GroupBy(s => s.Name)
                                                         .Select(g => g.First());
D Stanley
  • 149,601
  • 11
  • 178
  • 240
  • Excellent answer. Distinct’s default behavior is a common source of confusion and it’s important to know how to override it. – Jason Enochs Dec 10 '13 at 18:16
4

You're asking about getting unique values of 'Name' from my newSpecs collection, but your code returns IEnumerable<SpecialtyData>. Something is wrong: either the question or the return type.

If you need only names, you need IEnumerable<string> instead:

IEnumerable<string> distinctNames = newSpecs
    .Select(s => s.Name)
    .Distinct().OrderBy(x => x);

If you need entire SpecialtyData instances, you can either use GroupBy+First:

IEnumerable<SpecialtyData> distinctSpecialties = newSpecs
    .GroupBy(s => s.Name)
    .Select(g=> g.First())

or override Equals and GetHashCode methods within SpecialtyData class to make Distinct work.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
MarcinJuraszek
  • 124,003
  • 15
  • 196
  • 263
2

That's because the default comparison is a reference comparison. It's not looking at the data itself, it's giving you distinct references. Either use your Select() to get some value worth comparing or implement IEqualityComparer<T> for your type and pass it to Distinct.

Below is a sample implementation of IEqualityComparer<T>;

class SpecialtyDataEqualityComparer : IEqualityComparer<SpecialtyData>
{
    public bool Equals(SpecialtyData lhs, SpecialtyData rhs)
    {
        return lhs.Name == rhs.Name;
    }

    public int GetHashCode(SpecialtyData p)
    {
        return p.Name.GetHashCode();
    }
}

If you did .Distinct(new SpecialtyDataEqualitComparer()) then you would get items with distinct names. If you want some other definition of equality then change the logic in Equals to do the comparisons you require (make related changes to GetHashCode so they are consistent, if you do equality by name you shouldn't take the hashcode for some other property).

evanmcdonnal
  • 46,131
  • 16
  • 104
  • 115
2

Instead of doing custom equality comparers and all that, there is a really nice extension method from MoreLinq - https://code.google.com/p/morelinq/ (by Jon Skeet).

Either download the whole library of MoreLinq, or just add this code below:

public static IEnumerable<TSource> DistinctBy<TSource, TKey>(
                            this IEnumerable<TSource> source, Func<TSource, TKey> selector)
{
    var set = new HashSet<TKey>();
    return source.Where(element => set.Add(selector(element)));
}

And use it like so:

IEnumerable<SpecialtyData> distinctSpecialties = newSpecs
    .Select(s => s).DistinctBy(s => s.Name)
    .OrderBy(s => s.Name);
EkoostikMartin
  • 6,831
  • 2
  • 33
  • 62
  • @RobertHarvey - I use this A LOT, get very solid performance and don't have to write equality comparers for the 599323 entities I need to run `Distinct` on. – EkoostikMartin Dec 10 '13 at 18:02
  • Tbh - i dont think you even need the select part. Just .DistinctBy. Just a thought tho, should the Func<> be Expression> to leverage server side expression tree?. – jim tollan Dec 10 '13 at 18:32
  • Oh and make source IQueryable of course :) (oh and +1 into the bargain!) – jim tollan Dec 10 '13 at 18:42
1

You can use GroupBy as below

IEnumerable<SpecialtyData> distinctSpecialties = 
               newSpecs.GroupBy(s => s.Name).Select(g=> g.First()).OrderBy(s => s.Name);
Damith
  • 62,401
  • 13
  • 102
  • 153
0

In order to use Distinct() on a custom object (SpecialtyData), you will need to override the two methods Distinct() uses to determine equality.

    public override bool Equals(object other) {

    }

    public override int GetHashCode() {

    } 

Each of these two should return values that define the SpecialtyData objects as unique. In your case it may be as simple as evaluating the 'Name' property.

Bo TX
  • 424
  • 2
  • 7
0

There's a very nice post from James Michael Hare that explains why you're getting this results, and I'll quote it here:

For Distinct() (and many other LINQ features) to work, the class being compared (SpecialtyData in your example) must implement Equals() and GetHashCode(), or alternatively provide a separate IEqualityComparer<T> as an argument to Distinct().

Many LINQ methods take advantage of GetHashCode() for performance because internally they will use things like a Set<T> to hold the unique items, which uses hashing for O(1) lookups. Also, GetHashCode() can quickly tell you if two objects may be equivalent and which ones are definitely not - as long as GetHashCode() is properly implemented of course.

So you should make all your classes you intend to compare in LINQ implement Equals() and GetHashCode() for completeness, or create a separate IEqualityComparer<T> implementation.

Original post: This code returns distinct values. However, what I want is to return a strongly typed collection as opposed to an anonymous type

Community
  • 1
  • 1
OnoSendai
  • 3,960
  • 2
  • 22
  • 46
  • 2
    Yeah, so why not mark this as a duplicate...? – BartoszKP Dec 10 '13 at 17:49
  • If you see a duplicate question you should flag/vote to close as a duplicate. Copying an entire answer is not appropriate. – Servy Dec 10 '13 at 17:50
  • @BartoszKP: Unfortunately, that question is not a duplicate. Although it has the correct answer for this question, the original question is a very specific problem that doesn't much resemble this one. – Robert Harvey Dec 10 '13 at 17:51
  • @RobertHarvey Indeed, perhaps it's rather the other question should be marked as a duplicate of this one then? – BartoszKP Dec 10 '13 at 17:54
  • @BartoszKP: That's not true either. – Robert Harvey Dec 10 '13 at 17:55
  • If there was a way to allow for direct inline mentions of specific (but not exactly duplicated) answers, I'd gladly do so, @Servy. Or, is there one and I'm not aware of? In any case, I appreciate any comments on best practices, so thank you all. – OnoSendai Dec 10 '13 at 17:59
  • @OnoSendai: You can post a link to the answer below the question. – Robert Harvey Dec 10 '13 at 17:59
  • 1
    To add to Robert's comment, you should generally prefix the link with something, i.e. "related" or "possible duplicate" or whatever, rather than just a bare link. If a question *isn't* a duplicate, then it's a sign that the answer of the other question, by itself, probably shouldn't be the answer to this question. Even if it has most of the information; if the questions aren't dups there's a *very* high probability that it means the answer should be tailored to the new question. – Servy Dec 10 '13 at 18:00
  • @RobertHarvey, will do so from now on, and with that prefix, Servy. Thanks once again. – OnoSendai Dec 10 '13 at 18:02