Remove everything which is duplicate in a List>

Question

I hope you can help me out on this one. I have a List < List < double[] > > and I want to remove everything which is duplicate in such list. That is:

1) Within the List < double[] > there are some of the double[] which are duplicate.I want to keep only the non-duplicate doubles[] within the List < double[] >. See lists 1 and 5 in the picture.

2) Within List < List < double[] > > there are some of the List < double[] > which are duplicate. I want to keep only the non-repeated lists. See lists 0 & 2 and lists 1 & 3.

The desired output is designated in the picture:

enter image description here

I have tried the following but it doesn't work.

public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
{
    var output = new List<List<double[]>>();

    for (int i = 0; i < input.Count; i++)
    {
        var temp= input[i].Distinct().ToList();
        output.Add(temp);
    }
    return output.Distinct().ToList();
}

Can you please help me on this?

This might help: http://stackoverflow.com/questions/25886267/return-distinct-list-of-object-array-where-number-of-array-items-is-non-specific — sr28, Jun 24 '15 at 14:19
Possible duplicate: [IEqualityComparer for SequenceEqual](http://stackoverflow.com/q/14675720/1838048) — Oliver, Jun 24 '15 at 14:23

score 3 · Accepted Answer · edited May 23 '17 at 12:29

3

Your code (excluding the ToList collectors) seems logically equivalent to:

return input.Select(t => t.Distinct()).Distinct();

You're trying to use Distinct on collections. That's reasonable, since you are expecting to get distinct collections.

The problem is that you have left Distinct without logic to compare these collections. Without specifying that logic, Distinct can't compare collections properly (by equality of each individual member).

There is another overload of Distinct that takes an IEqualityComparer<T> as an argument. To use it, you'll have to implement such a comparer first. A reasonable implementation (adapted from Cédric Bignon's answer) could look like this:

public class ArrayComparer<T> : IEqualityComparer<T[]>
{
    public bool Equals(T[] x, T[] y)
    {
        return ReferenceEquals(x, y) || (x != null && y != null && x.SequenceEqual(y));
    }

    public int GetHashCode(T[] obj)
    {
        return 0;
    }
}

public class ListOfArrayComparer<T> : IEqualityComparer<List<T[]>>
{
    public bool Equals(List<T[]> x, List<T[]> y)
    {
        return ReferenceEquals(x, y) || (x != null && y != null && x.SequenceEqual(y, new ArrayComparer<T>()));
    }

    public int GetHashCode(List<T[]> obj)
    {
        return 0;
    }
}

Your code should then look like this:

    public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
    {
        var output = new List<List<double[]>>();

        for (int i = 0; i < input.Count; i++)
        {
            var temp = input[i].Distinct(new ArrayComparer<double>()).ToList();
            output.Add(temp);
        }
        return output.Distinct(new ListOfArrayComparer<double>()).ToList();
    }

Or even just:

    public static List<List<double[]>> CleanListOfListsOfDoubleArray(List<List<double[]>> input)
    {
        var output = input.Select(t => t.Distinct(new ArrayComparer<double>()).ToList()).ToList();

        return output.Distinct(new ListOfArrayComparer<double>()).ToList();
    }

Keep in mind that this would be a lot less complicated if you used more specific types for describing your problem.

If, for example, instead of double[], you used a more specific pair type (like Tuple<double, double>), you would only need to implement one comparer (the first Distinct call could be left with its default behavior, if I remember correctly).

If, instead of the List<double> you had a specialized PairCollection that implements its own equality method, you wouldn't need the second equality comparer either (your original code would work as it already is, most probably).

So, to avoid problems like this in the future, try to declare specialized types for your problem (instead of relying on the generic lists and arrays and nesting them like here).

edited May 23 '17 at 12:29

Community

1
1

answered Jun 24 '15 at 14:17

Theodoros Chatzigiannakis

28,773
8
68
104

1

how to use it in `Distinct` ? – M.kazem Akhgary Jun 24 '15 at 14:30
1

@M.kazemAkhgary You create an instance of this class and pass it to `Distinct`. – Theodoros Chatzigiannakis Jun 24 '15 at 14:32
You should fix your hash code method. As you say, it's bad, and needs to be fixed. – Servy Jun 24 '15 at 14:33
@Servy The comment wasn't mine (as I mentioned, it's from another user's answer) but I've edited to make it XOR the hash codes instead of accumulating them. – Theodoros Chatzigiannakis Jun 24 '15 at 14:35
1

@TheodorosChatzigiannakis That's no better in any way. It has all of the same problems, most notably, sequences with the same items in a different order will all have an identical hash. – Servy Jun 24 '15 at 14:35
@Servy Obviously I'm not used to implementing `GetHashCode()`. Can you check if this is better now? If not, feel free to add the necessary fixes. – Theodoros Chatzigiannakis Jun 24 '15 at 14:39
Now it doesn't even compile. You haven't declared `i`. if you intend it to be the index however, it will in now way address that problem. The whole point of XOR and addition is that they're both reflective, so the order that you apply all of them is irrelevant. – Servy Jun 24 '15 at 14:42
@Servy Yes, I did mean to use the other `Select`, thanks for noticing! I see your point about reflectiveness of xor and addition. I've edited with a snippet that compiles and produces different hashes for differently ordered sequences. Is there anything else I'm not seeing? – Theodoros Chatzigiannakis Jun 24 '15 at 14:53
You're now using less and less of each hash code that you get, until you get to the 32nd item at which point you're literally using *none* of the bits of the hash. But even before then, say, the 16th item in the list, you're only using 16 bits of its hash, so half of the information is lost. Since you're using so much less of the hash, you're *dramatically* increasing the odds of collisions of items that don't even have the same hash. If anything this is quite a bit worse than what you were doing before. – Servy Jun 24 '15 at 14:54
@Servy Can you propose a possible implementation? – Theodoros Chatzigiannakis Jun 24 '15 at 14:55
You can simply [ask google](https://www.google.com/webhp?hl=en#hl=en&q=implementing+get+hash+code) if you don't know how to implement `GetHashCode`. It's very much a solved problem. – Servy Jun 24 '15 at 14:57
Hi Theodoros and Servy, many thanks for your answers. Sorry if it sounds like a basic question but... how would you use this in my input list? I am not sure how I should call this class to get the desired output... – user3641829 Jun 24 '15 at 15:09
@Servy Well, in that case (and since I don't often work with hash codes and can't pretend to be an expert on that issue), I'll leave it to the reader to google an implementation that's good enough for their purposes. I believe even an implementation with extremely high collisions will work (since the hash code isn't supposed to act as a unique key, anyway). You might want to point these issues out in the original answer that contained this hash function, however (as it could mislead other people who are looking for an implementation for collections as well). – Theodoros Chatzigiannakis Jun 24 '15 at 15:14
@user3641829 I've included the transformations that can be applied to your code, see the edit. – Theodoros Chatzigiannakis Jun 24 '15 at 15:48
@Theodoros Many thanks for this. I have tried and it eliminates the duplicates doubles[] within the List successfully. However it does not eliminate the duplicated List within List>. Any idea why this is happening? – user3641829 Jun 24 '15 at 16:01
@user3641829 Try it now. – Theodoros Chatzigiannakis Jun 24 '15 at 16:10
@Theodoros, It works perfectly! Many thanks for all your help!! – user3641829 Jun 24 '15 at 16:24

Remove everything which is duplicate in a List>

1 Answers1