13

I got a list:

var list = new List<List<int>>();

which could contain

list[0] = {1, 2, 3, 4}
list[1] = {3, 1, 2, 4}
list[2] = {2, 1, 7, 3}

How can I detect the duplicate between [0] and [1] and remove one of them? Code is c-sharp.

In reality it's not a int, but that shouldn't change the question.

Jim Carragher
  • 303
  • 1
  • 3
  • 8
  • 1
    Will the lists always have the same length? Will they always contain unique elements? If two lists are determined to be "equal" does it matter which one of them is removed? – Mark Byers Dec 23 '10 at 10:31
  • @Jim: I would change the sentence: **"How can I detect the duplicate between [0] and [1] and remove one of them?"** in **"How can I detect that [0] and [1] are equal and remove one of the two?"** because the former it's a little bit ambiguous IMO... – digEmAll Dec 23 '10 at 10:56
  • @Jim: obviously only if my second sentence correctly represents the concept you want to express... ;) – digEmAll Dec 23 '10 at 11:00
  • But are they equal? They just contain the same values. I'm quite a noob on the area, but with different order, they can't be equal - right? – Jim Carragher Dec 23 '10 at 11:24
  • They're equal from your point of view. Otherwise how can you say you want to perform a `Distinct`? Anyway, I suggested you to change your sentence because it sounds like you need to find values that are unique between list[0] and [1] ... – digEmAll Dec 23 '10 at 11:38
  • possible duplicate of [Comparing two collections for equality irrespective of the order of items in them](http://stackoverflow.com/questions/50098/comparing-two-collections-for-equality-irrespective-of-the-order-of-items-in-the) – nawfal Nov 08 '13 at 20:29

4 Answers4

15

You could write your own implementation of IEqualityComparer<List<int>>. For GetHashCode() it would simply return the XOR of all the hash codes of the elements in the list. For Equals() it would create a new HashSet<int> from the first list, and call HashSet<T>.SetEquals on it, passing in the second list. This assumes there will be no duplicate elements, mind you. (Otherwise { 1, 1, 2 } will be equal to { 1, 2, 2 } but have a different hash code.)

Once you've got that far, you can use Distinct:

var distinct = list.Distinct(new CustomEqualityComparer());

As an alternative approach, could you use HashSet<T> as your collection type to start with? Then it's really easy:

var distinct = sets.Distinct(HashSet<int>.CreateSetComparer());

If you need lists as the input but can cope with sets as the output:

var distinct = list.Select(x => new HashSet<int>(x))
                   .Distinct(HashSet<int>.CreateSetComparer());
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
7

Here's the euqality comparer Jon Skeet is talking about (his advice regarding working with HashSets to begin with is also spot on, of course):

    public class EnumerableComparer<T> : IEqualityComparer<IEnumerable<T>> 
                                          where T : IComparable<T>
    {
        public bool Equals(IEnumerable<T> first, IEnumerable<T> second)
        {
            if (first == second)
                return true;
            if ((first == null) || (second == null))
                return false;

            return new HashSet<T>(first).SetEquals(second);
        }

        public int GetHashCode(IEnumerable<T> enumerable)
        {
            return enumerable.OrderBy(x => x)
              .Aggregate(17, (current, val) => current*23 + val.GetHashCode());
        }
    }

So you'd do something like:

list.Distinct(new EnumerableComparer());

If the elements are not guaranteed to be unique - Use the IEqualityComparer I posted here: Comparing two collections for equality irrespective of the order of items in them

(In previous edits, I mistakingly posted an IEqulityComparer that compares between two lists of lists - could be very useful when dealing with partitions, but that's a different topic)

Community
  • 1
  • 1
Ohad Schneider
  • 36,600
  • 15
  • 168
  • 198
  • Thanks! Though I was answering a different question before (albeit with a pretty good solution :) ) – Ohad Schneider Dec 23 '10 at 10:44
  • Note that that isn't quite the hash code algorithm I was giving - it's likely to give a better hash code, but it's slightly more expensive as it requires ordering the collection first. I assume you've got a ToHashSet extension method somewhere, too :) – Jon Skeet Dec 23 '10 at 11:25
  • Well I got the above hash code from you as well, so no worries :) Regarding the extension method - you are right of course. I changed my answer for simplicity's sake, even though it's very easy to implement - or just download from, say: http://windowsclient.net/blogs/damonwildercarr/archive/2008/09/10/expose-new-linq-operations-from-the-screaming-hashset-lt-t-gt-collection.aspx – Ohad Schneider Dec 23 '10 at 20:36
1
list[1] = list[1].Except(list[0]).ToList();

This is the solution in the assumption that we need to remove the duplicate ints from the arrays list[0] and list[1]. Other answers are dealing with the case of removing arrays which contain the same set of ints.

Vlad
  • 35,022
  • 6
  • 77
  • 199
  • You need list[0].except(list[1]) too – Ohad Schneider Dec 23 '10 at 10:31
  • @ohadsc: no, the OP wants to "detect the duplicate and remove one of them". The elements which were previously duplicates remain now only in `list[0]`. – Vlad Dec 23 '10 at 10:33
  • @Jim: what do you mean by _all_ cases? – Vlad Dec 23 '10 at 10:36
  • @Vlad: look at the other answers, OP needs to remove `list[0]` or `list[1]` from `list` because they contain exactly the same elements (just in diff order). He doesn't want to remove elements that are equals between `list[0]` and `list[1]`... basically he doesn't want to touch sublist at all, he just want to remove equal sublists keeping just one of them. – digEmAll Dec 23 '10 at 10:39
  • @digEmAll: I understood the "duplicate between [0] and [1]" as "duplicate items in list[0] and list[1]". How would you interpret this expression? – Vlad Dec 23 '10 at 10:45
  • @Vlad: I agree with you, infact initially I understood the same thing as you. But (also looking to the other answers) I guessed the OP was just not explaining himself well... – digEmAll Dec 23 '10 at 10:49
  • @digEmAll: I would better rely on whatever the OP says, not on the others' answers. Anyway, it makes no sense to change my answer now, because others have already answered to the question according to their interpretation. – Vlad Dec 23 '10 at 10:51
1
boolean compareLists(list1, list2) {
  // Early rejection
  if (list1.size != list2.size) {
    return false;
  }

  // Sort lists and compare each item
  sorted1 = sort(list1.clone());
  sorted2 = sort(list2.clone());
  for (int i=0; i<list1.size; ++i) {
    if (sorted1[i]!=sorted2[i]) {
      return false;
    }
  }
  return true;
}
Vincent Mimoun-Prat
  • 28,208
  • 16
  • 81
  • 124