Remove duplicates with lambda leaving last item (from dupes) alive

Question

I'm trying to refactor an old code "for-bubled" that I had to remove duplicates inside a collection of Items where if properties X Y and Z match the ones from a previously inserted Item, only the last item to be inserted should be preserved in the collection:

 private void RemoveDuplicates()
 {       
   //Remove duplicated items.       
   int endloop = Items.Count;
   for (int i = 0; i < endloop - 1; i++)
   {
     var item = Items[i];
     for (int j = i + 1; j < endloop; j++)
     {
      if (!item.HasSamePropertiesThan(Items[j]))
      {
        continue;
      }

      AllItems.Remove(item);
      break;
     }
   }       
 }

where HasSameProperties() is an extension method for Item and does something similar to:

public static bool HasSamePropertiesThan(this Item i1, Item i2)
{
  return string.Equals(i1.X, i2.X, StringComparison.InvariantCulture)
  && string.Equals(i1.Y, i2.Y, StringComparison.InvariantCulture)
  string.Equals(i1.Z, i2.Z, StringComparison.InvariantCulture);
}

so if I have a collection like:

[0]A
[1]A
[2]A
[3]B
[4]A
[5]A

I want to be able to delete all duplicates, leaving only [3]B and [5]A alive.

so far, I've managed to craft these lambdas:

var query = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last());  // Retrieves entities to not delete
        var dupes = Items.Except(query);
        dupes.ToList().ForEach(d => Items.Remove(d));

based on these examples:

Remove duplicates in the list using linq

Delete duplicates using Lambda

Which don't seem to work quite well... (The removed items are incorrect, some items are left in the collection and should've been removed) what am I doing wrong?

What exactly is it doing wrong? Is it throwing an exception? Or are the results incorrect? — Zach Spencer, Jul 09 '14 at 19:43
It seems like `query.ToList()` from your code above will do the trick. Why isn't that working for you? — Mike Hixson, Jul 09 '14 at 19:44
Why do you need to remove items? `query.ToList()` has the items you want in it. — Mike Hixson, Jul 09 '14 at 19:56

score 2 · Accepted Answer · answered Jul 09 '14 at 20:06

mmm a quick question? the result of "Query" it supose to have the result that you are looking for? in my opinión you are getting a list of the ítems, then you do a query with the elements founded before and at the end you are removing from the original list the result

correct me if I'm wrong but is not the same doing something like this:

items = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last()).ToList();

if the result of "Query" is not returning the right elements then your problem is how are yo doing the query, or problably you need to order the list before apply the query

score 0 · Answer 2 · answered Jul 09 '14 at 19:58

0

You could either use a HashSet, or using linq do something like this:

var dups = new string[]{"A","A","B","B"};
var nonDupe = dups.Distinct().ToArray();

answered Jul 09 '14 at 19:58

Zach Spencer

1,859
15
21

There are some properties for 'Item' i'm not showing in the example (so technically [0]A is not completely equal to [1]A ) because I depend in the order of appearance in the collection to distinguish between the 'outdated' objects from the 'new' ones. That said, I think Distinct() would leave [0]A and [1]A (at least in my example) since they're not completely equal and afaik Distinct() takes the first Item that matches the selection criteria and ignores the rest items in the collection (which is kind of the opposite I want, the last matching Item for that criteria). Right? – safejrz Jul 09 '14 at 20:40

Remove duplicates with lambda leaving last item (from dupes) alive

2 Answers2