2

A Python developer doing some C# (.NET 4.6, Visual Studio 2015 Professional) work here. I am trying to check whether two HashSets are equal.

I have two HashSet<List<float>> which I am trying to compare using

thisList.SetEquals(otherList);

However, this returns false on my data. Using the sample from the MSDN HashSet's examples does work as expected. However, in the samples they use HashSet<int> whereas I use HashSet<List<float>>.

As I could not found a way to print the HashSet contents into Immediate Window in Visual Studio (ToString returns "System.Collections.Generic.HashSet1[System.Collections.Generic.List1[System.Single]]"), I use Json.NET JsonConvert.SerializeObject(thisList); to dump the data into a .json file on disk.

Two files (each for each HashSet contents is:

[[10.0,15.0],[20.0,25.0]] and [[10.0,15.0],[20.0,25.0]]

Inspecting the HashSets in Visual Studio while debugging looks like this:

-       thisList    Count = 2   System.Collections.Generic.HashSet<System.Collections.Generic.List<float>>
-       [0] Count = 2   System.Collections.Generic.List<float>
        [0] 10  float
        [1] 15  float
+       Raw View        
-       [1] Count = 2   System.Collections.Generic.List<float>
        [0] 20  float
        [1] 25  float
+       Raw View        
+       Raw View        
-       otherList   Count = 2   System.Collections.Generic.HashSet<System.Collections.Generic.List<float>>
-       [0] Count = 2   System.Collections.Generic.List<float>
        [0] 20  float
        [1] 25  float
+       Raw View        
-       [1] Count = 2   System.Collections.Generic.List<float>
        [0] 10  float
        [1] 15  float
+       Raw View        
+       Raw View        

Each HashSet contains two lists (order is not of relevance, since it is a set) and each list has identical values (with the same order). They should be considered equal.

What should I do to make these HashSets to be considered equal with thisList.SetEquals(otherList);?

EDIT:

Printing coord.ToString("G17") on each float:

10
15
20
25
20
25
10
15
Alex Tereshenkov
  • 3,340
  • 8
  • 36
  • 61
  • 2
    In short, because list equality is going to be based on ReferenceEquals and most likely you have two different list objects in each collection (even if perhaps with the same values). And that's before getting into the problem of comparing floats with strict equality. But what are you trying to accomplish with a HashSet of lists? This feels like a bit of an X-Y problem – lc. Jun 01 '18 at 06:11
  • 4
    I love the smell of X/Y on a friday afternoon – TheGeneral Jun 01 '18 at 06:12
  • @lc, thanks for the comment. I have a toy `PointCollection` class with `X` and `Y` properties for the points. I am implementing an `Equals` method to check whether two collections are equal and this would be when each of the class instances have the same number of points each of which having the same `X` and `Y` coordinates. Should I be using the tuples instead of the lists in a HashSet? Because in Python, `set([(10.0, 15.0), (20.0, 25.0)]) == set([(20.0, 25.0), (10.0, 15.0)])` is `True`. – Alex Tereshenkov Jun 01 '18 at 06:25
  • @LasseVågsætherKarlsen, updated the question body. – Alex Tereshenkov Jun 01 '18 at 06:28
  • 1
    It's literally an XY problem :) – MineR Jun 01 '18 at 06:30
  • @AlexTereshenkov, because you have a List instead of a Vec2 class with appropriate equality members, you are doing a reference comparison. Make a Vec2 class/struct and override the equality members. – MineR Jun 01 '18 at 06:34
  • @MineR: From where do you have it that `.SetEquals` does reference comparison? – Lasse V. Karlsen Jun 01 '18 at 06:36
  • Because what's in the HashSet is a List and the equality members for a List are the same as object. – MineR Jun 01 '18 at 06:38
  • Ah, I missed the list level, sorry, never mind any of my comments. I thought I saw `HashSet`. – Lasse V. Karlsen Jun 01 '18 at 06:38
  • @MineR, ah, so you suggest I compare two `HashSet` where each `Point` would be considered equal to another `Point` where their XY would be identical (I will override `Equals` method in the `Point` class)? Did I get it right? – Alex Tereshenkov Jun 01 '18 at 06:45
  • Yes - you need to override Equals and GetHashCode... in your point class. – MineR Jun 01 '18 at 06:46
  • You should provide the `IEqualityComparer` type object while initializing `HashSet`. See the posted answer. – user1672994 Jun 01 '18 at 06:55

2 Answers2

3

Because you are using List in your HashSet, it is comparing the two lists as references instead of considering the values in the Lists.

Instead of using a List to represent an X and Y, use a Vector2 or Point class. This is more or less what the struct should look like:

public struct Point
{
    public double X {get; }
    public double Y { get; }

    public Point(double x, double y)
    {
        X = x;
        Y = y;
    }

    public bool Equals(Point other)
    {
        return X.Equals(other.X) && Y.Equals(other.Y);
    }

    public override bool Equals(object obj)
    {
        if (ReferenceEquals(null, obj)) return false;
        return obj is Point && Equals((Point) obj);
    }

    public override int GetHashCode()
    {
        unchecked
        {
            return (X.GetHashCode() * 397) ^ Y.GetHashCode();
        }
    }
}
MineR
  • 2,144
  • 12
  • 18
  • Thanks! I already had the `Point` class with `Equals (Point other)` overloaded, however, I needed to add `Equals(object obj)` and `GetHashCode`. It's all working now as expected. Mind writing a couple of words on why do you need `Equals(object obj)`and what is the 397 (I guess a random seed value for hashing?) – Alex Tereshenkov Jun 01 '18 at 07:39
  • You don't actually need the Equals(Point other) - it is just somewhat nicer/faster when you are comparing two points because it doesn't have to do the if statement/casting. With GetHashCode, the point is to come up with as unique a number as possible while still ensuring that anything that can possibly be equal will have the same number. That code is auto generated by Resharper (coding tool) - I assume they have some reason for 397, which is probably explained somewhere on SO. – MineR Jun 01 '18 at 07:43
  • https://stackoverflow.com/questions/102742/why-is-397-used-for-resharper-gethashcode-override – MineR Jun 01 '18 at 07:44
  • FWIW you might not need to reinvent the wheel if you can use [`System.Drawing.PointF`](https://msdn.microsoft.com/en-us/library/system.drawing.pointf(v=vs.110).aspx) – lc. Jun 01 '18 at 08:21
  • @lc. I thought that too intially, but PointF doesn't override GetHashCode and is not immutable – MineR Jun 01 '18 at 08:31
  • @MineR Eww, [you're totally right](https://referencesource.microsoft.com/#System.Drawing/commonui/System/Drawing/Advanced/PointF.cs,b4963c55b18ead87). I didn't realize that - seems like a rather "interesting" design decision. – lc. Jun 01 '18 at 08:36
2

You are trying to check equal operation of HashSet<List<float>> with another object of HashSet<List<float>>. The question here is that why it is returning false?

Now, before we talk about HashSet<List<float>>, let's talk about if I check equal (using below code) for List<float> with another object of List<float>, then what would be the output?

    List<float> list = new List<float>() { 10.0f, 15.0f};
    List<float> anotherList = new List<float>() { 10.0f, 15.0f};

    Console.WriteLine(list.Equals(anotherList));

The output of this will be

false

Since here Equals compare the references of the objects (which aren't equal).

Now to solution to your problem

You should provide a EqualityComparer while initializing the HashSet which should check the type T as needed.

    HashSet<List<float>> HashSet1 = new HashSet<List<float>>(new FloatListComparer());
    anotherHashSet1.Add(list);

    HashSet<List<float>> anotherHashSet2 = new HashSet<List<float>>();
    anotherHashSet2.Add(anotherList);

    Console.WriteLine(anotherHashSet1.SetEquals(anotherHashSet2));

The output of above code is

true

The EqualityComparer I've written here is looks like as follows.

public class FloatListComparer : EqualityComparer<List<float>>
{
    public override bool Equals(List<float> list1, List<float> list2)
    {
        return list1.SequenceEqual(list2);
    }

    public override int GetHashCode(List<float> s)
    {
        return base.GetHashCode();
    }
}

Now question here is that why SetEquals is not working

If you check the implementation of SetEquals at here, then you will find that it calls the default comparer of T which works based on checking the reference of objects. By providing the Comparer, SetEquals uses the specified one.

Check the live fiddler here.

user1672994
  • 10,509
  • 1
  • 19
  • 32
  • This is a good alternative, but you shouldn't sort the lists as what's in the lists are X,Y coordinates... – MineR Jun 01 '18 at 06:58
  • @MineR - SO has asked the question for `HashSet>` so I've provided the implementation based on that. See the line in his question --- `I have two HashSet> which I am trying to compare using`. Also, there are many ways to Compare two lists, I've provided one of the approach. – user1672994 Jun 01 '18 at 06:59
  • This solution treats the lists as lists, which is not really what was asked for, taking the entire question in consideration. He wants to treat the data as sets all the way down. Semantically, SetEquals should treat [1,1,1] == [1] – Tewr Jun 01 '18 at 07:03
  • yes, the SetEquals method ignores duplicate entries and treat [1,1,1] == [1]. – user1672994 Jun 01 '18 at 07:18