Removing duplicate lines from a list based on specific columns

Question

I believe that this is similar to this but I was not able to apply the same solution.

I have a list with several columns:

    public struct InfoForGraph
    {
        public float a{ get; set; }
        public double b{ get; set; }
        public double c { get; set; }
        public double d { get; set; }
        public double e { get; set; }
        public double f { get; set; }
        public double g { get; set; }
        public double h { get; set; }
        public double i { get; set; }
        public double j { get; set; }
    }

I would like to remove duplicate lines from this list but only if specific fields match. If I do a distinct with the whole table, these lines will not be erased. Also, I don't care with the repeated lines, I just wanna keep one of them.

Input:

2.67|1.84|420|400|1608039|808|3117|1|2|3|4
2.68|1.84|420|401|1608039|808|3269|1|2|3|4

Output expected:

2.67|1.84|420|400|1608039|808|3117|1|2|3|4

So, if columns 1,2,5,6,8,9,10 have the same value, I should keep only the first return (deleting the 2nd, 3rd, where all these fields match.)

Any ideas?

Sach · Accepted Answer · 2017-08-09T20:47:56.330

4

For simplicity, I narrowed down your condition to say that two objects are equal if InfoForGraph.b and InfoForGraph.c are equal. You get the idea and change your comparer as you like.

public class InfoComparer : IEqualityComparer<InfoForGraph>
{
    public bool Equals(InfoForGraph x, InfoForGraph y)
    {
        if (x.b == y.b && x.c == y.c)
            return true;
        else
            return false;
    }

    public int GetHashCode(InfoForGraph obj)
    {
        unchecked
        {
            int hash = 17;
            hash = hash * 23 + obj.b.GetHashCode();
            hash = hash * 23 + obj.c.GetHashCode();
            return hash;
        }
    }
}

Then call Distinct() on it.

var unique = list.Distinct(new InfoComparer());

edited Aug 09 '17 at 20:47

answered Aug 09 '17 at 19:23

Sach

10,091
8
47
84

I should mention that to be complete you should check for null objects and other such anomalies/errors in `GetHashCode()` as well as `Equals()`. – Sach Aug 09 '17 at 19:26
Wouldn't two objects which are supposed to be equal get different hashcodes? For example: {b=1, c=2} and {b=1, c=3} – Maor Veitsman Aug 09 '17 at 20:28
@MaorVeitsman I simplified OP's conditions, so in my examples two objects are equal if `Obj1.b == Obj2.b && Obj1.c == Obj2.c`. So under these conditions your example objects are not deemed equal. – Sach Aug 09 '17 at 20:44
1

What formula would you use if the 'or' statement is used? – Maor Veitsman Aug 09 '17 at 20:56
1

@MaorVeitsman that's a fantastic question. I might have to think about it. **Edit:** Do you know an answer? – Sach Aug 09 '17 at 21:02
1

@MaorVeitsman I asked this in a [separate SO question](https://stackoverflow.com/a/45601130/302248), and turns out you cannot really use OR in an equality comparer as it violates fundamental assumptions of the comparer which is equality being transitive. – Sach Aug 09 '17 at 22:08

Removing duplicate lines from a list based on specific columns

1 Answers1

Linked