58

Short question:

Is there a simple way in LINQ to objects to get a distinct list of objects from a list based on a key property on the objects.

Long question:

I am trying to do a Distinct() operation on a list of objects that have a key as one of their properties.

class GalleryImage {
   public int Key { get;set; }
   public string Caption { get;set; }
   public string Filename { get; set; }
   public string[] Tags {g et; set; }
}

I have a list of Gallery objects that contain GalleryImage[].

Because of the way the webservice works [sic] I have duplicates of the GalleryImage object. i thought it would be a simple matter to use Distinct() to get a distinct list.

This is the LINQ query I want to use :

var allImages = Galleries.SelectMany(x => x.Images);
var distinctImages = allImages.Distinct<GalleryImage>(new 
                     EqualityComparer<GalleryImage>((a, b) => a.id == b.id));

The problem is that EqualityComparer is an abstract class.

I dont want to :

  • implement IEquatable on GalleryImage because it is generated
  • have to write a separate class to implement IEqualityComparer as shown here

Is there a concrete implementation of EqualityComparer somewhere that I'm missing?

I would have thought there would be an easy way to get 'distinct' objects from a set based on a key.

Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689

9 Answers9

45

(There are two solutions here - see the end for the second one):

My MiscUtil library has a ProjectionEqualityComparer class (and two supporting classes to make use of type inference).

Here's an example of using it:

EqualityComparer<GalleryImage> comparer = 
    ProjectionEqualityComparer<GalleryImage>.Create(x => x.id);

Here's the code (comments removed)

// Helper class for construction
public static class ProjectionEqualityComparer
{
    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TSource, TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }

    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TSource, TKey> (TSource ignored,
                               Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }
}

public static class ProjectionEqualityComparer<TSource>
{
    public static ProjectionEqualityComparer<TSource, TKey>
        Create<TKey>(Func<TSource, TKey> projection)
    {
        return new ProjectionEqualityComparer<TSource, TKey>(projection);
    }
}

public class ProjectionEqualityComparer<TSource, TKey>
    : IEqualityComparer<TSource>
{
    readonly Func<TSource, TKey> projection;
    readonly IEqualityComparer<TKey> comparer;

    public ProjectionEqualityComparer(Func<TSource, TKey> projection)
        : this(projection, null)
    {
    }

    public ProjectionEqualityComparer(
        Func<TSource, TKey> projection,
        IEqualityComparer<TKey> comparer)
    {
        projection.ThrowIfNull("projection");
        this.comparer = comparer ?? EqualityComparer<TKey>.Default;
        this.projection = projection;
    }

    public bool Equals(TSource x, TSource y)
    {
        if (x == null && y == null)
        {
            return true;
        }
        if (x == null || y == null)
        {
            return false;
        }
        return comparer.Equals(projection(x), projection(y));
    }

    public int GetHashCode(TSource obj)
    {
        if (obj == null)
        {
            throw new ArgumentNullException("obj");
        }
        return comparer.GetHashCode(projection(obj));
    }
}

Second solution

To do this just for Distinct, you can use the DistinctBy extension in MoreLINQ:

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector)
    {
        return source.DistinctBy(keySelector, null);
    }

    public static IEnumerable<TSource> DistinctBy<TSource, TKey>
        (this IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        source.ThrowIfNull("source");
        keySelector.ThrowIfNull("keySelector");
        return DistinctByImpl(source, keySelector, comparer);
    }

    private static IEnumerable<TSource> DistinctByImpl<TSource, TKey>
        (IEnumerable<TSource> source,
         Func<TSource, TKey> keySelector,
         IEqualityComparer<TKey> comparer)
    {
        HashSet<TKey> knownKeys = new HashSet<TKey>(comparer);
        foreach (TSource element in source)
        {
            if (knownKeys.Add(keySelector(element)))
            {
                yield return element;
            }
        }
    }

In both cases, ThrowIfNull looks like this:

public static void ThrowIfNull<T>(this T data, string name) where T : class
{
    if (data == null)
    {
        throw new ArgumentNullException(name);
    }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • What is the point of the second static Create method with the `TSource ignored` parameter? public static ProjectionEqualityComparer Create (TSource ignored, Func projection) { return new ProjectionEqualityComparer(projection); } – flesh Apr 02 '11 at 09:15
  • 1
    @flesh: It allows type inference to kick in when you may not be able to specify the type explicitly - e.g. for anonymous types. – Jon Skeet Apr 02 '11 at 10:02
  • ThrowIfNull is missing from the source code in the answer and will not compile. This seems to work: `public static T ThrowIfNull(this T value, string variableName) where T : class { if (value == null) { throw new NullReferenceException(string.Format("Value is Null: {0}", variableName)); } return value; }` – Daryl Jul 21 '14 at 21:02
  • @Daryl: Yes, that's almost it - except that it should throw `ArgumentNullException`. Will add the version that's in MiscUtil... – Jon Skeet Jul 21 '14 at 21:10
7

Building on Charlie Flowers' answer, you can create your own extension method to do what you want which internally uses grouping:

    public static IEnumerable<T> Distinct<T, U>(
        this IEnumerable<T> seq, Func<T, U> getKey)
    {
        return
            from item in seq
            group item by getKey(item) into gp
            select gp.First();
    }

You could also create a generic class deriving from EqualityComparer, but it sounds like you'd like to avoid this:

    public class KeyEqualityComparer<T,U> : IEqualityComparer<T>
    {
        private Func<T,U> GetKey { get; set; }

        public KeyEqualityComparer(Func<T,U> getKey) {
            GetKey = getKey;
        }

        public bool Equals(T x, T y)
        {
            return GetKey(x).Equals(GetKey(y));
        }

        public int GetHashCode(T obj)
        {
            return GetKey(obj).GetHashCode();
        }
    }
kvb
  • 54,864
  • 2
  • 91
  • 133
6

This is the best i can come up with for the problem in hand. Still curious whether theres a nice way to create a EqualityComparer on the fly though.

Galleries.SelectMany(x => x.Images).ToLookup(x => x.id).Select(x => x.First());

Create lookup table and take 'top' from each one

Note: this is the same as @charlie suggested but using ILookup - which i think is what a group must be anyway.

Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689
  • I agree that it feels like the framework is lacking something. I don't know if it is IEqualityComparer though ... it really needs both methods. It feels like there should be an easier way of using Distinct: an override that takes a predicate. – Charlie Flowers Apr 04 '09 at 05:47
  • Not a predicate. I mean an override of Distinct that would take T and let you select the object that you want to use for distinctiveness. – Charlie Flowers Apr 04 '09 at 05:54
  • @charlie - right, thats what i actually thought i WAS going to get with the existing Distinct(..). i'd just never used it in this context before, and of course it turned out not to be what i expected – Simon_Weaver Apr 04 '09 at 11:01
  • No extra libraries or classes - this is a clean solution. – BrianLegg Jan 26 '22 at 15:12
  • @BrianLegg thanks for making me feel old ;-) – Simon_Weaver Jan 26 '22 at 19:13
4

This idea is being debated here, and while I'm hoping the .NET Core team adopt a method to generate IEqualityComparer<T>s from lambda, I'd suggest you to please vote and comment on that idea, and use the following:

Usage:

IEqualityComparer<Contact> comp1 = EqualityComparerImpl<Contact>.Create(c => c.Name);
var comp2 = EqualityComparerImpl<Contact>.Create(c => c.Name, c => c.Age);

class Contact { public Name { get; set; } public Age { get; set; } }

Code:

public class EqualityComparerImpl<T> : IEqualityComparer<T>
{
  public static EqualityComparerImpl<T> Create(
    params Expression<Func<T, object>>[] properties) =>
    new EqualityComparerImpl<T>(properties);

  PropertyInfo[] _properties;
  EqualityComparerImpl(Expression<Func<T, object>>[] properties)
  {
    if (properties == null)
      throw new ArgumentNullException(nameof(properties));

    if (properties.Length == 0)
      throw new ArgumentOutOfRangeException(nameof(properties));

    var length = properties.Length;
    var extractions = new PropertyInfo[length];
    for (int i = 0; i < length; i++)
    {
      var property = properties[i];
      extractions[i] = ExtractProperty(property);
    }
    _properties = extractions;
  }

  public bool Equals(T x, T y)
  {
    if (ReferenceEquals(x, y))
      //covers both are null
      return true;
    if (x == null || y == null)
      return false;
    var len = _properties.Length;
    for (int i = 0; i < _properties.Length; i++)
    {
      var property = _properties[i];
      if (!Equals(property.GetValue(x), property.GetValue(y)))
        return false;
    }
    return true;
  }

  public int GetHashCode(T obj)
  {
    if (obj == null)
      return 0;

    var hashes = _properties
        .Select(pi => pi.GetValue(obj)?.GetHashCode() ?? 0).ToArray();
    return Combine(hashes);
  }

  static int Combine(int[] hashes)
  {
    int result = 0;
    foreach (var hash in hashes)
    {
      uint rol5 = ((uint)result << 5) | ((uint)result >> 27);
      result = ((int)rol5 + result) ^ hash;
    }
    return result;
  }

  static PropertyInfo ExtractProperty(Expression<Func<T, object>> property)
  {
    if (property.NodeType != ExpressionType.Lambda)
      throwEx();

    var body = property.Body;
    if (body.NodeType == ExpressionType.Convert)
      if (body is UnaryExpression unary)
        body = unary.Operand;
      else
        throwEx();

    if (!(body is MemberExpression member))
      throwEx();

    if (!(member.Member is PropertyInfo pi))
      throwEx();

    return pi;

    void throwEx() =>
      throw new NotSupportedException($"The expression '{property}' isn't supported.");
  }
}
Shimmy Weitzhandler
  • 101,809
  • 122
  • 424
  • 632
  • 2
    Awesome to see this has been resolved and added to the .NET 8 milestone! https://github.com/dotnet/runtime/pull/75212 – Jeff Camera Mar 06 '23 at 15:41
4

What about a throw away IEqualityComparer generic class?

public class ThrowAwayEqualityComparer<T> : IEqualityComparer<T>
{
  Func<T, T, bool> comparer;

  public ThrowAwayEqualityComparer(Func<T, T, bool> comparer)   
  {
    this.comparer = comparer;
  }

  public bool Equals(T a, T b)
  {
    return comparer(a, b);
  }

  public int GetHashCode(T a)
  {
    return a.GetHashCode();
  }
}

So now you can use Distinct with a custom comparer.

var distinctImages = allImages.Distinct(
   new ThrowAwayEqualityComparer<GalleryImage>((a, b) => a.Key == b.Key));

You might be able to get away with the <GalleryImage>, but I'm not sure if the compiler could infer the type (don't have access to it right now.)

And in an additional extension method:

public static class IEnumerableExtensions
{
  public static IEnumerable<TValue> Distinct<TValue>(this IEnumerable<TValue> @this, Func<TValue, TValue, bool> comparer)
  {
    return @this.Distinct(new ThrowAwayEqualityComparer<TValue>(comparer);
  }

  private class ThrowAwayEqualityComparer...
}
JJGAP
  • 161
  • 1
  • 2
  • 11
Samuel
  • 37,778
  • 11
  • 85
  • 87
  • Pretty good. Then you could also implement the override of Distinct that I wished for. – Charlie Flowers Apr 04 '09 at 05:53
  • Yes, you could easily do that and get what you wanted. – Samuel Apr 04 '09 at 05:54
  • But aren't you still implementing IEqualityComparer. It sounded like you didn't want to do that. – Abhijeet Patel Apr 04 '09 at 06:10
  • 5
    Note that this won't necessarily work; there's no guarantee that the GetHashCode implementation you've supplied will be consistent with the Equals method. This could then give wrong results. – kvb Apr 04 '09 at 06:45
  • @abhijeet - sure this is still implementing IEqualityComparer, but this is meant for 'generic' use. it could be hidden away in a utility class and switched out for the framework version if microsoft ever added one. – Simon_Weaver Apr 04 '09 at 11:03
  • Sure, that's true. But T's GetHashCode also may not match the Func method passed in to the constructor, in which case your class won't work properly. This will often be the case in practice (e.g. the class uses the default hash code implementation and you extract a key for comparison). – kvb Apr 04 '09 at 20:24
  • I'd say since he is using IEnumerable, you don't really need to concern yourself with GetHashCode(). It's really only used for a hash table. – Samuel Apr 04 '09 at 20:42
  • 1
    It's perfectly reasonable for methods such as IEnumerable.Distinct to use the GetHashCode() function to bin items before doing a presumably more expensive equality check. Your implementation does not fulfill IEqualityComparer's contract. – kvb Apr 05 '09 at 00:28
4

This is currently in .NET 8 preview 2. The OP would use

var distinctImages = allImages.Distinct<GalleryImage>(
    EqualityComparer<GalleryImage>.Create((a, b) => a.id == b.id));

in place of their suggested

var distinctImages = allImages.Distinct<GalleryImage>(
    new EqualityComparer<GalleryImage>((a, b) => a.id == b.id));

More info:

https://github.com/dotnet/runtime/pull/75212

rory.ap
  • 34,009
  • 10
  • 83
  • 174
3

You could group by the key value and then select the top item from each group. Would that work for you?

Charlie Flowers
  • 17,338
  • 10
  • 71
  • 88
  • yes i'm just looking at that actually - via the ToLookup(). maybe inefficient and slow but ok for this task. posting my statement above/below – Simon_Weaver Apr 04 '09 at 05:34
1

Here's an interesting article that extends LINQ for this purpose... http://www.singingeels.com/Articles/Extending_LINQ__Specifying_a_Property_in_the_Distinct_Function.aspx

The default Distinct compares objects based on their hashcode - to easily make your objects work with Distinct, you could override the GetHashcode method.. but you mentioned that you are retrieving your objects from a web service, so you may not be able to do that in this case.

markt
  • 5,126
  • 31
  • 25
0

implement IEquatable on GalleryImage because it is generated

A different approach would be to generate GalleryImage as a partial class, and then have another file with the inheritance and IEquatable, Equals, GetHash implementation.

Richard
  • 106,783
  • 21
  • 203
  • 265