0

I have a situation where I need a collection to be GroupBy on a HashSet<myClass> where myClass overrides Equals(myClass), Equals(object), GetHashCode(), ==, and !=.

When I perform the GroupBy() the results are however not grouped. The same occurs for Distinct(). It is created in a large LINQ query which calls ToHashSet() on values of myClass. The result is then used where the resulting HashSet itself is the key to a Dictionary<HashSet<myClass>, someOtherCollection>.

I have distilled the problem down to the simplest case, where two HashSet<myClass>, myHashSet1 and myHashSet2, both contain only the same single element. If I call myHashSet1.Equals(myHashSet2) it returns false, while myHashSet1.SetEquals(myHashSet2) returns true.

What am doing wrong here? What can I do to make GroupBy group HashSets when all elements match?

Possibly one step along the way is HashSet<T>.CreateSetComparer() cannot specify IEqualityComparer<T>, is there an alternative? which explains how to override a default IEqualityComparer for HashSet. But IF this is part of the answer, the critical remaining questions becomes how do I let GroupBy know to use this equality comparer?

I assume I should be feeding it when I call ToHashSet() , maybe ToHashSet(myHashSetEqualityComparer<myClass>), but it only takes a ToHashSet(IEqualityComparer<myClass>), not a ToHashSet(IEqualityComparer<HashSet<myClass>>)

Here's the code of myClass distilled to the essentials:

public class myClass : myBaseClass, IEquatable<myClass>
{   
    public string Prop1 { get; set; }
    public string Prop2 { get; set; }
    public Guid Prop3 { get; set; }

    public override bool Equals(myClass other)
    {    
      if (Equals(other, null)) return false;

      return (Prop1 == other.Prop1 && Prop2 == other.Prop2 && Prop3 == other.Prop3);
    }

    public override bool Equals(object obj)
    {
      if (Equals(obj, null) || !(obj is myClass))
        return false;

     return Equals((myClass)obj);
    }

    public static bool operator ==(myClass left, myClass right)
    {
     if (Object.Equals(left, null))
        return (Object.Equals(right, null)) ? true : false;
     else
        return left.Equals(right);
    }

    public static bool operator !=(myClass left, myClass right)
    {
     return !(left == right);
    }

    public override int GetHashCode()
    {
     return Prop3.GetHashCode() + 31 * (Prop2.GetHashCode() +
         31 * Prop1.GetHashCode());
    }
 }

Per request in comment this is what I am doing:

var myGroupedResult = myUngroupedCollection.
       GroupBy(x => x.Value).
       ToDictionary(x => x.Key, x => x.ToList());
// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>>,
// produced by LINQ 
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>

I expect the result to produce a dictionary where the keys are HashSet<myClass> and the values are List<someClass>. If I have 5 distinct hashsets each with 10 occurrences of someClass, I expect a Dictionary with 5 keys, each with a value that is a List with 10 elements. Instead I get a Dictionary with 50 keys each with a value being a List that has 1 element.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Stefan de Kok
  • 2,018
  • 2
  • 15
  • 19
  • I believe, that `myHashSet1.Equals(myHashSet2)` performs reference equality, unless you override it. And `false` is expected value here. And usage of `SetEquals` from linked duplicate also seems to be correct – Pavel Anikhouski Jun 19 '20 at 22:35
  • @PavelAnikhouski, probably true. Which means I have at least 2 possible approaches to finding an answer: 1 - override HashSet.Equals to match or return HashSet.SetEquals so GroupBy will use that, or 2 - pass an IEqualityComparer> to GroupBy directly. My question is how do I do either of those? – Stefan de Kok Jun 19 '20 at 22:57
  • a third option may be to prevent ToHashSet() in the LINQ query from creating multiple identical HashSet with different reference, since they are in fact the same instance. Again, the question: how? – Stefan de Kok Jun 19 '20 at 23:04
  • Implementing an `IEqualityComparer` is generally preferable to implementing the `IEquatable` interface, because it is more flexible. For example you can have multiple `IEqualityComparer`s with different semantics for each class, in case it is meaningful to do so (equality by name, equality by ID etc). On the contrary the `IEquatable` is able to represent only one kind of equality. – Theodor Zoulias Jun 20 '20 at 10:21
  • The link provided when this question was closed showed what I was missing. Providing the IEqualityComparer to GroupBy directly fixed my issue. Thanks @Alexei Levenkov – Stefan de Kok Jun 20 '20 at 16:11
  • @TheodorZoulias, in my case all classes have only a single equality semantic, so I never needed an IEqualityComparer. Today was a first, and not because I needed a different way to determine equality, but to override the default behavior which ignored the defined equality mechanism, which I think is nonsensical. If an item implements IEquatable, it should be used by default in all operations that need to verify equality, such as GroupBy, Distinct, etc. – Stefan de Kok Jun 20 '20 at 16:18
  • 1
    Stefan I looked deeper to your question, and it is not clear to me what you are asking. It seems that you have some problem related with `HashSet`s and `GroupBy`/`Distinct` Linq methods, but there is no `HashSet` or `GroupBy` or `Distinct`in the code you provided. Could you update your question with a minimal example showing the difference between expected and actual behavior? – Theodor Zoulias Jun 21 '20 at 01:51
  • @StefandeKok Good to hear that you got the problem solved! Could you write an answer to your own question, as it could be useful to other users? – jpa Jun 21 '20 at 07:07
  • @jpa. Just did. But for some reason it was downvoted right away. – Stefan de Kok Jun 22 '20 at 15:28
  • @Servy, you closed this question twice now with a link to a related question that does NOT solve this problem. In this question it was explicitly stated that it does not. Unfortunately the link @AlexeiLevenkov provided, which DID lead to a solution is now no longer visible. Please reopen. Note the devil in the detail: your link refers to an `IEqualityComparer` where the solution is actually in an `IEqualityComparer>` – Stefan de Kok Jun 22 '20 at 15:57
  • @StefandeKok You literally copy pasted the answer from the duplicate as an answer. Clearly it *does* answer the question. That you said in the question that it doesn't when *you posted an answer to just copy it* indicates that that wasn't the case. The duplicate *does* provide an `IEqualityComparer>`, not n `IEqualityComparer`.. – Servy Jun 22 '20 at 16:02
  • @Servy, I see what you are saying. I did take the IEqualityComparer from that post for completeness, but my problem was how to apply it. I did not even know if it was the right path or if the ultimate solution would even use that. So, yes one half of the answer was from that link and I included it so anyone else with the same problem would have both keys to the puzzle.The critical missing piece was the part that follows in my answer. – Stefan de Kok Jun 22 '20 at 16:11
  • 1
    @StefandeKok There's lots of info on how to provide a custom comparer to GroupBy, I've added another duplicate if you feel that's needed, even though the documentation of GroupBy covers how to do that. Given your question was discussing adding custom comparers to try to use GroupBy, it didn't seem like you were struggling to pass a custom comparer to GroupBy. – Servy Jun 22 '20 at 16:41
  • @Servy, I was struggling figuring out what I didn't know yet. Even though you provided a partial solution I did not yet know if that was another of many dead ends I had encountered or if it could be along the way. I still see multiple potential alternatives, I just do not know if any of them have a light at the end of the tunnel. My preferred alternative would be that the `ToHashSet` that is the precursor to all of this does not create multiple instances if all the elements are the same in the first place. Then a `ReferenceEquals` on the HashSet would just work. – Stefan de Kok Jun 22 '20 at 17:45
  • @StefandeKok Well the morale of the story here is that when you're given the solution to your problem, rather than stating it doesn't solve your problem without even trying it, you should *actually try the solution you were given* to determine if it actually works. If it doesn't work, you then have the benefit of being able to explain what problems it has, rather than just repeating the question over and over and yelling at people who give you the correct answer. – Servy Jun 22 '20 at 17:49
  • @Servy, I did actually try fanatically and could not get it to work. And I never yelled. If I may give a friendly piece of unsolicited advise: please be less trigger happy with closing questions. It shuts off an avenue to finding a solution if the one you perceive to be the solution happens to not be it, or in this case not the whole story. Instead provide the link in a comment or answer first to allow the OP to determine if it indeed worked and ask follow-up questions if needed. I still strongly feel this question is not a duplicate. – Stefan de Kok Jun 22 '20 at 18:39
  • @StefandeKok When you write out in bold and all caps, that's consider yelling when communicating via text. And you weren't giving "friendly advice" you were just saying it's not a duplicate even though *it was clearly a duplicate* as evidenced by you just copy-pasting the answer and saying it's the answer. Clearly closing the question *didn't* prevent you from finding the solution, given that the duplicate *was the answer*. That you like copy-pasting duplicate answers over and over instead of closing as duplicates doesn't make that the appropriate process, because it's not. – Servy Jun 22 '20 at 18:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/216451/discussion-between-stefan-de-kok-and-servy). – Stefan de Kok Jun 22 '20 at 18:43

1 Answers1

1

I was able to solve my issue. Posting an answer here in case anyone else runs into the same issue.

The solution has two steps. First create a generic IEqualityComparer<HashSet<T>> (from the link in the question):

   public class HashSetEqualityComparerBySetEquals<T> : IEqualityComparer<HashSet<T>>
   {
      public bool Equals(HashSet<T> x, HashSet<T> y)
      {
         if (ReferenceEquals(x, null))
            return false;

         return x.SetEquals(y);
      }

      public int GetHashCode(HashSet<T> set)
      {
         int hashCode = 0;

         if (set != null)
         {
            foreach (T t in set)
            {
               hashCode = hashCode ^
                   (set.Comparer.GetHashCode(t) & 0x7FFFFFFF);
            }
         }

         return hashCode;
      }    
   } 

Then provide it in the GroupBy() (hint came from here: GroupBy on complex object (e.g. List<T>), which works on List, but not as-is on HashSet, which needs an additional elementSelector as second parameter):

HashSetEqualityComparerBySetEquals<myClass> comparer = new HashSetEqualityComparerBySetEquals<myClass>();

var myGroupedResult = myUngroupedCollection.
       GroupBy(x => x.Value, x => x.Key, comparer).
       ToDictionary(x => x.Key, x => x.ToList());

// myUngroupedCollection is an IEnumerable<KeyValuePair<someClass, HashSet<myClass>>> produced by LINQ but could be a Dictionary or another collection.
// myGroupedResult is a Dictionary<HashSet<myClass>, List<someClass>>

The same IEqualityComparer can also be used when performing other LINQ operations that check for equality, such as Distinct() and FirstOrDefault():

var thisWorksAsExpected = myGroupedResult.FirstOrDefault(x => comparer.Equals(x.Key, aHashSetWithSameElements));
var thisAlsoWorks = myGroupedResult.FirstOrDefault(x => x.Key.SetEquals(aHashSetWithSameElements));

var thisDoesNotWork = myGroupedResult.FirstOrDefault(x => x.Key == aHashSetWithSameElements);
// thisDoesNotWork returns null sometimes even when all elements match
Community
  • 1
  • 1
Stefan de Kok
  • 2,018
  • 2
  • 15
  • 19