194

I have a collection:

List<Car> cars = new List<Car>();

Cars are uniquely identified by their property CarCode.

I have three cars in the collection, and two with identical CarCodes.

How can I use LINQ to convert this collection to Cars with unique CarCodes?

wonea
  • 4,783
  • 17
  • 86
  • 139
user278618
  • 19,306
  • 42
  • 126
  • 196
  • Related / possible duplicate of: [LINQ's Distinct() on a particular property](https://stackoverflow.com/q/489258/3258851) – Marc.2377 Dec 13 '18 at 19:17

9 Answers9

348

You can use grouping, and get the first car from each group:

List<Car> distinct =
  cars
  .GroupBy(car => car.CarCode)
  .Select(g => g.First())
  .ToList();
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
134

Use MoreLINQ, which has a DistinctBy method :)

IEnumerable<Car> distinctCars = cars.DistinctBy(car => car.CarCode);

(This is only for LINQ to Objects, mind you.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 4
    just providing the link!http://code.google.com/p/morelinq/source/browse/MoreLinq/?r=d4396b9ff63932be0ab07c36452a481d20f96307 – Diogo Jul 17 '13 at 16:48
  • 1
    Hi Jon, two questions if I may. 1) Why don't you add the library to Nuget? 2) What about LINQ to SQL\EF\NH? how can we implement that? Do we have to use Guffa version(which is your version if `NO_HASHSET` is true...)? Thank you very much! – gdoron Oct 17 '13 at 12:57
  • 2
    @gdoron: 1) It's in NuGet already: http://www.nuget.org/packages/morelinq 2) I doubt that LINQ to SQL etc are flexible enough to allow that. – Jon Skeet Oct 17 '13 at 12:58
  • Ohh, it's prerelease... that's why I couldn't find it. 2) Well I'm afraid adding the Lib to my project, I'm afraid someone will use it with `IQueryable` and try to `DistinctBy` it and thus query the whole God damn table... Isn't it error prone? Thanks again from your extremely quick response! – gdoron Oct 17 '13 at 13:04
  • @gdoron: No, 2.0 is prerelease - 1.0 isn't. As for whether it's error-prone... well, that's true of LINQ in general, in that you could always pass an `IQueryable` to something expecting `IEnumerable`. – Jon Skeet Oct 17 '13 at 13:14
  • I tried using MoreLinq for the same task, particularly DistinctBy. It wasn't very efficient at all. The MoreLinq/DistinctBy query made the method take 2.5minutes to execute as per Chrome's network tab, where as the GroupBy approach took 1.5 seconds. I had to convert to Queryable using the AsQueryable method, maybe that had some influence. – Gustavo Guevara Jun 05 '14 at 17:57
  • Would you consider it a bad habit to include the MoreLinq features under the `System.Linq` namespace, so no need to add additional usings to each file that wants to access those features? – Shimmy Weitzhandler Jan 24 '15 at 21:46
  • 3
    @Shimmy: I'd personally feel nervous about writing code under `System` as that gives a false impression of it being "official". But your tastes may vary, of course :) – Jon Skeet Jan 24 '15 at 21:55
  • @gdoron: Yes, it is. I'll edit the LINQ. – Jon Skeet Jan 20 '16 at 21:01
69

Same approach as Guffa but as an extension method:

public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
    return items.GroupBy(property).Select(x => x.First());
}

Used as:

var uniqueCars = cars.DistinctBy(x => x.CarCode);
  • 3
    Perfect. This same method is also provided on the Microsoft.Ajax.Utilities library. – Savage Oct 06 '18 at 10:16
  • 1
    Note that in .NET 6 and .NET 7 Preview a more production ready version is available. See : https://github.com/dotnet/runtime/blob/ebba1d4acb7abea5ba15e1f7f69d1d1311465d16/src/libraries/System.Linq/src/System/Linq/Distinct.cs This includes both deferred execution and proper error handling. Code above is simplistic compared to the different scenarios that might occur including error conditions. – Tore Aurstad Apr 16 '22 at 14:19
33

You can implement an IEqualityComparer and use that in your Distinct extension.

class CarEqualityComparer : IEqualityComparer<Car>
{
    #region IEqualityComparer<Car> Members

    public bool Equals(Car x, Car y)
    {
        return x.CarCode.Equals(y.CarCode);
    }

    public int GetHashCode(Car obj)
    {
        return obj.CarCode.GetHashCode();
    }

    #endregion
}

And then

var uniqueCars = cars.Distinct(new CarEqualityComparer());
Anthony Pegram
  • 123,721
  • 27
  • 225
  • 246
  • How can we use this without writting : new CarEqualityComparer() ? – Parsa Feb 18 '17 at 11:39
  • 3
    @Parsa You can create an IEqualitiyComparer wrapper type that accepts lambdas. This would make it generalized: `cars.Distinct(new GenericEqualityComparer((a,b) => a.CarCode == b.CarCode, x => x.CarCode.GetHashCode()))`. I've used such in the past as it sometimes adds value when performing a one-off Distinct. – user2864740 May 11 '18 at 22:41
12

Another extension method for Linq-to-Objects, without using GroupBy:

    /// <summary>
    /// Returns the set of items, made distinct by the selected value.
    /// </summary>
    /// <typeparam name="TSource">The type of the source.</typeparam>
    /// <typeparam name="TResult">The type of the result.</typeparam>
    /// <param name="source">The source collection.</param>
    /// <param name="selector">A function that selects a value to determine unique results.</param>
    /// <returns>IEnumerable&lt;TSource&gt;.</returns>
    public static IEnumerable<TSource> Distinct<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, TResult> selector)
    {
        HashSet<TResult> set = new HashSet<TResult>();

        foreach(var item in source)
        {
            var selectedValue = selector(item);

            if (set.Add(selectedValue))
                yield return item;
        }
    }
Luke Puplett
  • 42,091
  • 47
  • 181
  • 266
7

I think the best option in Terms of performance (or in any terms) is to Distinct using the The IEqualityComparer interface.

Although implementing each time a new comparer for each class is cumbersome and produces boilerplate code.

So here is an extension method which produces a new IEqualityComparer on the fly for any class using reflection.

Usage:

var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();

Extension Method Code

public static class LinqExtensions
{
    public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
    {
        GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
        return items.Distinct(comparer);
    }   
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
    private Func<T, TKey> expr { get; set; }
    public GeneralPropertyComparer (Func<T, TKey> expr)
    {
        this.expr = expr;
    }
    public bool Equals(T left, T right)
    {
        var leftProp = expr.Invoke(left);
        var rightProp = expr.Invoke(right);
        if (leftProp == null && rightProp == null)
            return true;
        else if (leftProp == null ^ rightProp == null)
            return false;
        else
            return leftProp.Equals(rightProp);
    }
    public int GetHashCode(T obj)
    {
        var prop = expr.Invoke(obj);
        return (prop==null)? 0:prop.GetHashCode();
    }
}
Anestis Kivranoglou
  • 7,728
  • 5
  • 44
  • 47
1

You can't effectively use Distinct on a collection of objects (without additional work). I will explain why.

The documentation says:

It uses the default equality comparer, Default, to compare values.

For objects that means it uses the default equation method to compare objects (source). That is on their hash code. And since your objects don't implement the GetHashCode() and Equals methods, it will check on the reference of the object, which are not distinct.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
0

Another way to accomplish the same thing...

List<Car> distinticBy = cars
    .Select(car => car.CarCode)
    .Distinct()
    .Select(code => cars.First(car => car.CarCode == code))
    .ToList();

It's possible to create an extension method to do this in a more generic way. It would be interesting if someone could evalute performance of this 'DistinctBy' against the GroupBy approach.

JwJosefy
  • 730
  • 7
  • 12
  • 1
    The second `Select` would be an O(n*m) operation, so that won't scale well. It could perform better if there are a lot of duplicates, i.e. if the result of the first `Select` is a very small part of the original collection. – Guffa Dec 24 '13 at 11:21
0

You can check out my PowerfulExtensions library. Currently it's in a very young stage, but already you can use methods like Distinct, Union, Intersect, Except on any number of properties;

This is how you use it:

using PowerfulExtensions.Linq;
...
var distinct = myArray.Distinct(x => x.A, x => x.B);
Andrzej Gis
  • 13,706
  • 14
  • 86
  • 130
  • If i have a list of objects where I want to delete all objects with the same ID's, will it be `myList.Distinct(x => x.ID)` ? – Thomas Oct 18 '17 at 09:08