0

I have a class

class Foo{
   string valueOne;
   string valueTwo;
}

I have multiple of objects of Foo in a List. These objects are already distinct. However, I want to also remove objects if there are another object with equal valueOne and valueTwo == null. It is guaranteed that there will always only be maximum one other "duplicate" with the same value in valueOne. There can be one or more objects with the same valueOne, but different value in valueTwo.

Eg:

var foo1 = new Foo{
   valueOne = "Equal Value",
   valueTwo = "Some value"
};

var foo2 = new Foo{
   valueOne = "Equal Value",
   valueTwo = "Different value"
};

var foo3 = new Foo{
   valueOne = "Equal Value",
   valueTwo = null
};

var foo4 = new Foo{
   valueOne = "SomeOtherValue",
   valueTwo = "whatever"
};

var foo5 = new Foo{
   valueOne = "I have null value, but should still be in list",
   valueTwo = null
};

And after removing the duplicates with null values, I should have:

foo1:
valueOne = "Equal Value"
valueTwo = "Different"

foo2:
valueOne = "Equal Value"
valueTwo = "Different value"

foo4:
valueOne = "SomeOther Value"
valueTwo = "whatever" 

foo5
valueOne = "I have null value, but should still be in list"
valueTwo = null

So basically: A Foo object can stay in the list with valueTwo == null, only if there isn't another object with same valueOne.

I made it work by iterating over the list with a nested loop, and storing the indexes of the duplicates, and then looping over the indexes and using RemoveAt(index) on the list. However, this is really long and ugly code. Is there a way to accomplish this with Linq? I'm new to C# and Linq, hope this works though!

Thanks for any help!

EDIT: Just to make it clearer, I already call GroupBy on the list, making it distinct.

EDIT 2: Damnit, I wasn't clear in my explanation. There can be one or more objects with the same valueOne, but different value in valueTwo. It is NOT guaranteed that there will always only be maximum one other "duplicate" with the same value in valueOne.

It is although guaranteed that there will for every Foo object with a distinct valueOne, only be one other Foo object with the same valueOne, but with null as valueTwo. If that makes sense..

Ryan Sangha
  • 442
  • 6
  • 17
  • 1
    Try following : List foos = new List() { foo1, foo2, foo3, foo4 }; List results = foos.GroupBy(x => x.valueOne).Select(x => x.OrderByDescending(y => y.valueTwo).First()).ToList(); – jdweng Nov 30 '20 at 15:17
  • What if `valueOne`-s are equal, but `valueTwo`-s are both not `null`. Should these both items be in the result collection? – E. Shcherbo Nov 30 '20 at 15:18
  • 2
    https://stackoverflow.com/a/1606686/2004122 – Steven Ackley Nov 30 '20 at 15:21
  • Probably the cleanest option ^^ – Steven Ackley Nov 30 '20 at 15:22
  • @E.Shcherbo yeah, they should both stay in the list – Ryan Sangha Nov 30 '20 at 15:25
  • 1
    I personally like the solution proposed by @StevenAckley. It's clean and easy enough, because when you implement `IEqualityComparer` you can focus only on two items in the collection, namely items which are being compared and the framework does all other work for you. – E. Shcherbo Nov 30 '20 at 15:30
  • Thanks @jdweng, it worked! But yeah, I should try Stevens suggestion, as it is cleaner. Thanks lads! Much appreciated – Ryan Sangha Nov 30 '20 at 15:33
  • 2
    @StevenAckley how would make sure the Foo.ValueTwo == null was the one dropped by the distinct when using a custom compare? – MikeJ Nov 30 '20 at 15:39
  • @StevenAckley Couldn't get the custom comparer to work.. wondering the same as MikeJ – Ryan Sangha Nov 30 '20 at 15:48
  • Why create a custom class when the built in linq method do the job? Steven code does a distinct which runs in (N^2)/2 while the GroupBy runs in Log (N) which is faster. – jdweng Nov 30 '20 at 16:07
  • @jdweng because it would be more clear. Unfortunately it can't be done, Distinct returns the 1st value from the sequence and weeds out any subsequent duplicates. A custom comparer therefore won't work. Your answer works due to a side effect of how null values are sorted which makes it not very clear. But it's the best answer. – MikeJ Nov 30 '20 at 16:17
  • @jdweng where are you getting your Big O numbers? Are distinct and groupby not the same? https://stackoverflow.com/questions/3226663/the-big-o-of-distinct-method-with-a-custom-iequalitycomparer Are you also accounting for the OrderByDescending of each group? – MikeJ Nov 30 '20 at 16:19
  • Not sure what distinct is using. If it is a hash then two are the same. – jdweng Nov 30 '20 at 16:30

2 Answers2

1

It sounds like you may want your Foo class to implement IEquatable<T>, add in your custom comparison logic and then you can use .Distinct on it, as you like. You can find out more info about it here

Updated: Here is an implementation using a custom IEquatable. You should be able to tweak the logic to suit your needs if I am missing anything from your edge cases.

Dan Csharpster
  • 2,662
  • 1
  • 26
  • 50
  • I just don't understand how to implement logic to drop the one with null value when compared. Tried StevenAckley's suggestion which I guess is the same (using a custom comparer) but I can't make out the logic for the comparison.. – Ryan Sangha Nov 30 '20 at 16:08
  • Gotcha. I think I understand the logic now, but I wanted to verify one of the cases. If valueOne is the same and valueTwo is different, then are those considered different or the same? Basically, are you just comparing off of valueOne then or is the null valueTwo a special case? – Dan Csharpster Nov 30 '20 at 17:03
  • They are considered different. Foo("val1", "something") and Foo("val1", "somethingElse") are considered different, and should stay in the list. However, if there also is Foo("val1", null), then that object should be removed. If that makes sense hehe – Ryan Sangha Nov 30 '20 at 17:24
  • Okay, cool. I've updated my answer with a working example. – Dan Csharpster Dec 01 '20 at 00:21
  • Does this rely on always having a hash collision? It doesn't work for me if I give it an actual working GetHashCode implementation. Also it would be better if you put this logic in IEqualityComparer instead of IEquatable. You've tied specialized logic into the equality for the type. – MikeJ Dec 02 '20 at 18:31
  • Can you share your GetHashCode implementation? You could implement this with IEqualityComparer, although per MS docs "We recommend that you derive from the EqualityComparer class instead of implementing the IEqualityComparer interface, because the EqualityComparer class tests for equality using the IEquatable.Equals method instead of the Object.Equals method.". What is wrong with having custom logic for a custom type? This seems to meet the questioner's needs. – Dan Csharpster Dec 03 '20 at 13:16
1

A custom comparer by itself won't work. But using group by with distinct will. Start with a comparer that will compare by value:

class FooCompare : IEqualityComparer<Foo>
{
    public bool Equals([AllowNull] Foo x, [AllowNull] Foo y)
    {
        if (null == x && null == y)
            return true;
        if (null == x || null == y)
            return false;

        return x.ValueOne == y.ValueOne && x.ValueTwo == y.ValueTwo;
    }

    public int GetHashCode([DisallowNull] Foo obj)
    {
        return HashCode.Combine(obj.ValueOne, obj.ValueTwo);
    }
}

Here's a console main with the test data to show how it works.

    static void Main(string[] _)
    {
        var data = new Foo[] {
            new Foo { ValueOne = "Equal Value", ValueTwo = null },
            new Foo { ValueOne = "Equal Value", ValueTwo = "Different" },
            new Foo { ValueOne = "Equal Value", ValueTwo = "not same" },
            new Foo { ValueOne = "SomeOtherValue", ValueTwo = "Different" },
            new Foo { ValueOne = "SomeOtherValue", ValueTwo = null },
            new Foo { ValueOne = "I have null value, but should still be in list", 
                        ValueTwo = null },
            };

        var check = new Foo[] {
            new Foo { ValueOne = "Equal Value", ValueTwo = "Different" },
            new Foo { ValueOne = "Equal Value", ValueTwo = "not same" },
            new Foo { ValueOne = "SomeOtherValue", ValueTwo = "Different" },
            new Foo { ValueOne = "I have null value, but should still be in list",
                        ValueTwo = null },
            }.OrderBy(x => x.ValueOne);

        var compare = new FooCompare();
        var result2 = data.GroupBy(x => x.ValueOne)
                            .SelectMany(g =>
                            {
                                var d = g.Distinct(compare);

                                var nonNull = d.Where(x => x.ValueTwo != null);

                                if (nonNull.Any())
                                    return nonNull;

                                return d;
                            })
                            .OrderBy(x => x.ValueOne)
                            .ToArray();

        if (!Enumerable.SequenceEqual(check, result2, new FooCompare()))
            Console.WriteLine("Failed");
        else
            Console.WriteLine("Success");

        // this one will not work.
        var result = data.GroupBy(x => x.ValueOne)
                            .Select(g => g.OrderByDescending(y => y.ValueTwo).First())
                            .ToArray();

        if (!Enumerable.SequenceEqual(check, result, new FooCompare()))
            Console.WriteLine("Failed");
        else
            Console.WriteLine("Success");
    }

Also included the original GroupBy that will not work.

MikeJ
  • 1,299
  • 7
  • 10
  • Great, thanks for the write up! You certainly solved my problem, but I didn't write my explanation correctly. TryGetFirstNonNull and jdweng's suggestion works perfectly when there is only one extra object with same valueOne eg: Foo("val1", "something") and Foo("val1", null). However, I forgot to add that there can also in the same list be: Foo("val1", "somethingElse"). As in the list would look like this: Foo("val1", "something"), Foo("val1", "somethingElse") and Foo("val1", null). – Ryan Sangha Nov 30 '20 at 17:28
  • 1
    @ryansan I've updated the answer. I think this accomplishes what you're after. Seems like there should be a more simple way to do it, but if there is it didn't jump out at me. – MikeJ Nov 30 '20 at 18:06
  • Thanks, I marked Dan's answer, just because it was a little bit simpler than yours. But thanks anyways, really appreciate that you took time to help me:) – Ryan Sangha Dec 01 '20 at 21:41
  • @ryansan I'm not sure Dan's answer works. At any rate, you should not use IEquatable for this. You shouldn't tie specialized comparison logic to the type. Use IEqualityComparer instead. If his logic is relying on forcing hash collisions then you shouldn't use it as it won't be efficient unless you have small amounts of data. – MikeJ Dec 02 '20 at 18:53