1

I want to remove duplicate data in custom structure with LINQ.

Here is my custom structure:

enter image description here

As you can see the yellow section consider to be duplicate data which need to remove.

First idea come into my head is using IEqualityComparer, but it seems can't work well.

If Form of A equal to To of B and To of A equal to From of B, that would be consider duplicate data.

the small From will keep, the big From will remove

For example the index 5 will keep, but the index 6 will remove.

Does anyone know how to solve this in LINQ?

jeffrey chan
  • 175
  • 2
  • 14
  • Why doesn't `IEqualityComparer` seem to work well? It's the proper method for this problem. – Gert Arnold Dec 22 '18 at 23:36
  • if use IEqualityComparer , I need add a extract property in custom structure, also i need add a class – jeffrey chan Dec 23 '18 at 04:16
  • You don't need an "exact" property. You do need a class, but isn't that the idea of using an equality comparer (not sure if I understand). See my answer. – Gert Arnold Dec 23 '18 at 10:16

3 Answers3

1

You may filter your data with linq to get all the dupes and remove them afterwards.

My solution below may not be the smartest approach, but try it out.

This should conform to your custom data (it would have helped, if you would have specified it in your question):

public class CustomStructure
{
    public int From { get; set; }
    public int To { get; set; }
    public int Sum { get { return From + To; } }
}

Somewhere else, where you work with the data:

List<CustomStructure> customlist = GetCustomData();
IEnumerable<CustomStructure> dupes = customlist.Where(x => customlist.Any(y => x.From == y.To && x.To == y.From && x.From > y.From));

foreach (CustomStructure dupe in dupes)
{
    customlist.Remove(dupe);
}

I don't have my Visual Studio here, so this all was written without checks; hope it works.

Nicolas
  • 754
  • 8
  • 22
  • the condition `x.From > y.From` should include only the dupe record that has a higher `From` value - just add some debugging printout to identify which records were returned as dupes and then you can check the validation logic manually against them – Nicolas Dec 22 '18 at 19:35
  • yeah, I know what you mean, it seems work. but not fluent way to solve. thanks anyway, i got your idea. – jeffrey chan Dec 22 '18 at 19:39
1

I'm going to use Range instead of "custom structure":

class Range
{
    public Range(int from, int to)
    {
        From = from;
        To = to;
    }

    public int From { get; }
    public int To { get; }
}

using IEqualityComparer, but it seems can't work well.

Maybe because "equality" can't trivially be defined by equating one (or both) Range properties? But you (almost) perfectly define equality...

x.From == y.To && x.To == y.From

I think this should be amended by...

x.From == y.From && x.To == y.To

It seems reasonable that two ranges having equal To and From are equal.

This would be enough to implement an IEqualityComparer's Equals method.

However, the challenge of implementing GetHashCode is always that it should match the Equals method --equality defined there should result in identical hashes-- but now based on the properties of one object instance.

The first impulse is to base the hash on From + To. But that would make range(8,5) equal to range(7,6). This can be solved by also bringing From - To into the equation. Two ranges are equal when From + To is equal and when the absolute difference From - To is equal:

x.From + x.To == y.From  + y.To
    && Math.Abs( x.From - x.To) == Math.Abs(y.From  - y.To);

This is equality based on properties of a single instance on both sides of the equations so now we can implement GetHashCode. Following best practices (and helped by Resharper):

public int GetHashCode(Range obj)
{
    var hashCode = -1781160927;
    hashCode = hashCode * -1521134295 + (obj.From + obj.To).GetHashCode();
    hashCode = hashCode * -1521134295 + (Math.Abs(obj.From - obj.To)).GetHashCode();
    return hashCode;
}

And the complete comparer:

class RangeEqualityComparer : IEqualityComparer<Range>
{
    public bool Equals(Range x, Range y)
    {
        return y != null
               && x != null
               && x.From + x.To == y.From  + y.To
               && Math.Abs( x.From - x.To) == Math.Abs(y.From  - y.To);
    }

    public int GetHashCode(Range obj)
    {
        var hashCode = -1781160927;
        hashCode = hashCode * -1521134295 + (obj.From + obj.To).GetHashCode();
        hashCode = hashCode * -1521134295 + (Math.Abs(obj.From - obj.To)).GetHashCode();
        return hashCode;
    }
}

Now you get distinct ranges by...

ranges.OrderBy(r => r.From).Distinct(new RangeEqualityComparer())

The ordering defines which range of "equal" ranges will appear in the end result.

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
  • select from, to, from < to ? (from + | + to) : ( to + | + from) as key distinct by key I think it's a better, don't you think? – jeffrey chan Dec 23 '18 at 14:16
  • Sorry, I have no clue what you're saying here. – Gert Arnold Dec 23 '18 at 15:56
  • If you're asking if the solution with a `Where` clause is better, I'd say no, because it has to evaluate the `Where` condition for each element in the list (`list.Where(x => list.Any(...`) which means O(N²) vs close to O(N) for the comparer. – Gert Arnold Dec 23 '18 at 21:53
  • sorry to reply, i got OT for work. what i am saying is that add a property to identify the class. using From and To combine to a property, then you can group, finally distinct. – jeffrey chan Dec 24 '18 at 15:01
  • There is no single property that sufficiently identifies equality or inequality because there are more combinations of From + Two that yield the same sum. Not even if you'd use the hashcode calculation as a GetHashCode method for the Range class. These hashes can collide (= happen to be equal but shouldn't be). That's why you need a comparer. A comparer always executes `Equals` when hashes are equal. You also mention grouping. For grouping you can use the same comparer. No need to write complex LINQ statements all the time. – Gert Arnold Dec 25 '18 at 09:51
0

Thanks to @Nicolas for giving a idea I think it's the best way what I can get.

listRow.Where(x => listRow.Any(y => x.From < y.From && x.From == y.To && x.To == y.From)).ToList();
jeffrey chan
  • 175
  • 2
  • 14