10

I have a List<T> where T is a custom object. None of my object are equal but some might have an equal property. Is there any fast way to remove the duplicates by comparing the property? It doesn't matter which of the duplicates stays in the list.

vcsjones
  • 138,677
  • 31
  • 291
  • 286
Pantelis
  • 2,060
  • 3
  • 25
  • 40

4 Answers4

19

You can use List<T>.RemoveAll to do this efficiently.

For example, if you wanted to remove all elements where the Foo property had a value of 42, you could do:

theList.RemoveAll(i => i.Foo == 42);

If you're trying to make a list of distinct items by a property, ie: keep only distinct Foo items, I would recommend doing something like:

HashSet<int> elements = new HashSet<int>(); // Type of property

theList.RemoveAll(i => !elements.Add(i.Foo));

This will track which elements are "distinct" and remove all others.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • Brilliant use of a local `HashSet`! – Jon Senchyna Aug 20 '12 at 18:10
  • 3
    @JonSenchyna It's a bit ugly IMO (since the Predicate has side effects), but it's efficient, at least. – Reed Copsey Aug 20 '12 at 18:10
  • Correct me if I'm wrong, but this will not keep "distinct" properties, but rather remove the subsequent occurrences of duplicated properties. The resulting set of this call will depend on the order of elements, which differs confusingly from the default `Distinct`. (This may be what the OP intended, but it's an important thing to note.) – dlras2 Aug 20 '12 at 18:11
  • @DanRasmussen That's actually how `Enumerable.Distinct` works - it's implementation is very similar to the above - it uses a Set to track "found" items, and doens't include subsequent found items. – Reed Copsey Aug 20 '12 at 18:12
  • @DanRasmussen If you use the above, with the hash set being the item itself, you'll actually get the same answer as `Enumerable.Distinct` (though that's an implementation detail of `Distinct`, and not guaranteed by the API) – Reed Copsey Aug 20 '12 at 18:14
  • @ReedCopsey The difference is that, because this is only comparing a *subset* of equality, it's order-dependant. IE, the first of each matching element is kept. As you said, this works the same as `Enumerable.Distinct`, but since that compares the entire equality of the object, which instance it keeps is indeterminable. – dlras2 Aug 20 '12 at 18:15
  • 1
    In this particular case, the person asking the question explicitly stated it didn't matter which of the objects with a duplicate property was retained, so order-dependency also doesn't matter. – Joel Mueller Aug 20 '12 at 18:16
  • @ReedCopsey Sorry, I wasn't trying to point out an error in your answer or anything like that (I upvoted it, in fact.) I was just trying to point out a little gotcha that subsequent viewers may not realize at first. – dlras2 Aug 20 '12 at 18:19
  • @ReedCopsey Thank you very much. Marked as answear as it is the shortest snippet of all presented. – Pantelis Aug 21 '12 at 08:01
8

Group the objects based on the property value, then pick the first item in each group. Like this:

var distinctObjects = objects
    .GroupBy(x => x.Property)
    .Select(g => g.First());
Christoffer Lette
  • 14,346
  • 7
  • 50
  • 58
2

You can create a new class that implements IEqualityComparer<T> by comparing the property. Then you can use linq's Distinct method to get an IEnumerable that contains only the unique elements.

Alex Gelman
  • 534
  • 3
  • 11
  • This will work just fine, but Reed's answer is going to perform better, and doesn't require you to implement an interface for the sake of a single method call. – Joel Mueller Aug 20 '12 at 18:08
  • Thank you, thats the way I was doing it so far. Sorry forgot to mention it in my question. I was looking for something easier. – Pantelis Aug 21 '12 at 08:03
0

you can also use a very good library from here http://powercollections.codeplex.com/ and use Algorithms.RemoveDuplicates method. That library has many more other goodies on collections.

Eugen
  • 2,934
  • 2
  • 26
  • 47