52

I want to get the distinct values in a list, but not by the standard equality comparison.

What I want to do is something like this:

return myList.Distinct( (x, y) => x.Url == y.Url );

I can't, there's no extension method in Linq that will do this - just one that takes an IEqualityComparer.

I can hack around it with this:

return myList.GroupBy( x => x.Url ).Select( g => g.First() );

But that seems messy. It also doesn't quite do the same thing - I can only use it here because I have a single key.

I could also add my own:

public static IEnumerable<T> Distinct<T>( 
    this IEnumerable<T> input, Func<T,T,bool> compare )
{
    //write my own here
}

But that does seem rather like writing something that should be there in the first place.

Anyone know why this method isn't there?

Am I missing something?

Keith
  • 150,284
  • 78
  • 298
  • 434

4 Answers4

58

It's annoying, certainly. It's also part of my "MoreLINQ" project which I must pay some attention to at some point :) There are plenty of other operations which make sense when acting on a projection, but returning the original - MaxBy and MinBy spring to mind.

As you say, it's easy to write - although I prefer the name "DistinctBy" to match OrderBy etc. Here's my implementation if you're interested:

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector)
    {
        return source.DistinctBy(keySelector,
                                 EqualityComparer<TKey>.Default);
    }

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        if (source == null)
        {
            throw new ArgumentNullException("source");
        }
        if (keySelector == null)
        {
            throw new ArgumentNullException("keySelector");
        }
        if (comparer == null)
        {
            throw new ArgumentNullException("comparer");
        }
        return DistinctByImpl(source, keySelector, comparer);
    }

    private static IEnumerable<TSource> DistinctByImpl<TSource, TKey>
        (IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
        foreach (TSource element in source)
        {
            if (knownKeys.Add(keySelector(element)))
            {
                yield return element;
            }
        }
    }
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Thanks for the swift answer - I might use that! Any idea why they skipped all these ...By(Predicate) methods? – Keith Feb 06 '09 at 12:10
  • Not really, I'm afraid. I'll blog about the MoreLinq project when I've got a significant set of features... basically it'll be an open source project with extensions to LINQ to Objects, and probably Push LINQ too. – Jon Skeet Feb 06 '09 at 12:16
  • 7
    If I had to guess, I'd guess for parity with the IQueryable options, and what is realistic (without getting sick) in TSQL. So DISTINCT(table.column) is fine, but you'd need a handy key and some more complex TSQL for DistinctBy... – Marc Gravell Feb 06 '09 at 12:26
  • That's a good point Marc - if you'd posted it as an answer I'd have voted it up. – Keith Feb 06 '09 at 15:11
36

But that seems messy.

It's not messy, it's correct.

  • If you want Distinct Programmers by FirstName and there are four Amy's, which one do you want?
  • If you Group programmers By FirstName and take the First one, then it is clear what you want to do in the case of four Amy's.

I can only use it here because I have a single key.

You can do a multiple key "distinct" with the same pattern:

return myList
  .GroupBy( x => new { x.Url, x.Age } )
  .Select( g => g.First() );
Amy B
  • 108,202
  • 21
  • 135
  • 185
3

Jon, your solution is pretty good. One minor change though. I don't think we need EqualityComparer.Default in there. Here is my solution (ofcourse the starting point was Jon Skeet's solution)

    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
    {
        //TODO All arg checks
        HashSet<TKey> keys = new HashSet<TKey>();
        foreach (T item in source)
        {
            TKey key = keySelector(item);
            if (!keys.Contains(key))
            {
                keys.Add(key);
                yield return item;
            }
        }
    }
SVC
  • 39
  • 1
  • 3
    I'm not sure why this would be better than Jon's solution. `new HashSet()` will use `EqualityComparer.Default` anyway and by doing it your way you lose the ability to override it (for instance if `TKey` is `string` and you want case insensitivity). Also Jon uses the `HashSet.Add` method, while you use `HashSet.Contains` and then `HashSet.Add` - two operations. Admittedly you'd need a massive set to notice the difference, but why make it slower? – Keith Jun 21 '12 at 08:36
2

Using AmyB's answer, I've written a small DistinctBy extension method, to allow a predicate to be passed:

/// <summary>
/// Distinct method that accepts a perdicate
/// </summary>
/// <typeparam name="TSource">The type of the t source.</typeparam>
/// <typeparam name="TKey">The type of the t key.</typeparam>
/// <param name="source">The source.</param>
/// <param name="predicate">The predicate.</param>
/// <returns>IEnumerable&lt;TSource&gt;.</returns>
/// <exception cref="System.ArgumentNullException">source</exception>
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
    (this IEnumerable<TSource> source,
     Func<TSource, TKey> predicate)
{
    if (source == null)
        throw new ArgumentNullException("source");

    return source
        .GroupBy(predicate)
        .Select(x => x.First());
}

You can now pass a predicate to group the list by:

var distinct = myList.DistinctBy(x => x.Id);

Or group by multiple properties:

var distinct = myList.DistinctBy(x => new { x.Id, x.Title });
Amy B
  • 108,202
  • 21
  • 135
  • 185
Cerbrus
  • 70,800
  • 18
  • 132
  • 147