0

I have a weird issue. I want to implement an extension to List with a function to merge another list into it excluding the duplicate values:

    <Extension()>
    Public Sub AddUnique(Of T)(ByVal self As IList(Of T), ByVal items As IEnumerable(Of T))
        For Each item In items
            If Not self.Contains(item) Then self.Add(item)
        Next
    End Sub

Now, I have a class that I'll be creating objects from, and adding them to a list:

    Class DocInfo
        Public Property title As String
        Public Property fullPath As String

        Sub New(title As String, fullPath As String)
            Me.title = title
            Me.fullPath = fullPath
        End Sub
    End Class

Then, I have a list as a global variable:

Public docsInfo As New List(Of DocInfo)

And then I have a button handler that adds new items to that list:

Private Sub AddToList_Button_Click(sender As Object, e As RoutedEventArgs)
    Dim candidateItems As New List(Of DocInfo)
    For Each doc In selectedDocs
        candidateItems.Add(New DocInfo(doc.GetTitle(), doc.GetPathName()))
    Next

    docsInfo.AddUnique(candidateItems)
End Sub

(The doc and selectedDocs variables are outside of the scope of this question.)

Now, the important bit - GetTitle() and GetPathName() return the same strings on every button click (I have the same docs selected between clicks). Meaning that DocInfo objects that are added to the candidateItems, and then added to docsInfo, are identical. Nevertheless, the extension function AddUnique fails, resulting in duplicates in the list.

Puzzled, I ran GetHashCode() on these duplicate DocsInfo class objects:

    For Each docInfo In docsInfo
        Console.WriteLine(docInfo.title)
        Console.WriteLine(docInfo.fullPath)
        Console.WriteLine(docInfo.GetHashCode())
    Next

And this is the output:

Assem1^Test assembly.SLDASM
C:\Users\Justinas\AppData\Local\Temp\swx5396\VC~~\Test assembly\Assem1^Test assembly.SLDASM
7759225

Assem1^Test assembly.SLDASM
C:\Users\Justinas\AppData\Local\Temp\swx5396\VC~~\Test assembly\Assem1^Test assembly.SLDASM
14797678

With each button click, I am getting identical DocsInfo objects (title and fullPath properties have the same values), yet their hashes are different every time, and every comparison I can think of, fails to acknowledge that these objects are for all intents and purposes idendical.

Why is this happening? And how can I fix the AddUnique extension function to work as intended?

  • Why not just update the comparison in AddUnique to look at title and/or fullpath property? – Hursey Sep 13 '21 at 21:32
  • @Hursey Because I want to do lots of comparisons like this for many different custom classes, and I don't want to have to maintain the AddUnique function to handle every single type... If I add a new property to DocInfo class (or any other class), I would have to remember to update the AddUnique function to check for that property. Easy to forget, and later difficult to diagnose. – Justinas Rubinovas Sep 13 '21 at 21:34
  • I'm probably going to get corrected on this, but my understanding on this by passing the lists in byval means you are in effect creating a new instance of your object. In that case the hash for each instance will be different even though the contain the same data – Hursey Sep 13 '21 at 21:39
  • @Hursey No, `ByVal` with a reference type means that the reference is passed by value---the comparison is that if `ByRef`, then changing it to point to another object in the routine will also change it in the parent. It's tricky to grok. – Craig Sep 13 '21 at 21:42
  • Okay, that makes sense. But still, I don't understand why the comparison fails. Even direct comparison of these objects (object1.Equals(object2)) returns False, even though all the properties match. – Justinas Rubinovas Sep 13 '21 at 21:43
  • The issue is that you're trying to treat a reference type as if it were a value type. By default, different `Class`es in VB will be treated as distinct objects regardless of their contents. If you want different behavior, you need to either make them `Structure`s (I think `String` acts like a value type and hashes the same for the same content even though it's a reference type) or override `GetHashCode` and `Equals` and make your class act like a value type. – Craig Sep 13 '21 at 21:44
  • This also goes for `Object.Equals`, the default implementation for reference types (which can be overridden) is reference equality. – Craig Sep 13 '21 at 21:44
  • @Craig - well, I don't really want to change my classes into something different. I just want to have a generic method (like this AddUnique) to compare them for equality in their properties. – Justinas Rubinovas Sep 13 '21 at 21:45
  • Could always extend your AddUnique with an comparison function parameter – Hursey Sep 13 '21 at 21:47
  • Well, I want to implement that AddUnique to work with any class, without having to specify what properties to compare in each of these classes. That would be a hell to maintain. – Justinas Rubinovas Sep 13 '21 at 21:48
  • The next thing I would suggest then is a base class with a unique id property set in the constructor, then all your custom classes inherit this. Allowing for you to do your comparison on a known value – Hursey Sep 13 '21 at 21:52
  • Or possibly in your custom class override the Equals method to compare on actual property values. I think that would work with the List.Contains() – Hursey Sep 13 '21 at 21:54
  • Yeah, I suggested ID approach. But then I would hit a wall when generating these ID's: how do I know that I need to create an object with an ID of another, already existing object, if I can't compare these objects and find out if they are identical? As for the overriding: yes, I could do that, but then again, I would have to maintain these comparison overrides in every class, every time I want to add or remove a property in that class. This is exactly what I want to avoid. – Justinas Rubinovas Sep 13 '21 at 21:55
  • For equality comparisons, implement `IEquatable`. For sorting, implement `IComparable(Of YouClass)`. The former let's you determine whether two reference type objects are the same based on criteria you provide. – Jimi Sep 13 '21 at 22:16
  • @Jimi - but then I would have to write and maintain comparison function in each class that I want to compare, right? I want to avoid this at all costs, because I will have to come back and modify this function every time I add or remove a property in my class. – Justinas Rubinovas Sep 13 '21 at 22:18
  • That's how you determine whether two class objects are the same or not. You can generate a GUID (or similar), so you don't need to care about property values (btw, you don't need to compare all properties, you need to determine the criteria that make an object equal to another, you're the only one that can do it, build your classes accordingly), or make GetHashCode() return something unique (still, you define the criteria). – Jimi Sep 13 '21 at 22:22
  • @Jimi Well, basically the criteria is very simple: if all the values of properties of two objects are identical, then the objects should be considered identical. If at least one property is different, then the objects are different. This criteria applies to all the classes that I need to compare. So if I were to implement IEquatable, I would have to write comparisons for every single property, for every single class. And then maintain it. I want to automate this, so that my comparison function would automatically find out what these properties are in any class, and compare their values. – Justinas Rubinovas Sep 13 '21 at 22:26
  • Look I'm not trying to be rude here, but you've been given the reason why what your trying to achieve doesn't work and some viable alternatives. In short there isn't a silver bullet, and will take you a little work to implement a solution. While I'm all for usability and reducing maintenance overheads, don't fall into the trap of making things more complicated than they need to be for issues that may or may not ever come to pass – Hursey Sep 13 '21 at 22:36
  • @Hursey I understand. For now, I'll be using reflection, as it seems to be the best solution so far in terms of usability and maintenance overhead. If I encounter performance problems, I will probably have to try other solutions. Thank you very much for your help! – Justinas Rubinovas Sep 13 '21 at 22:38
  • I don't know what class models you're designing, for what use, I can say that I never found myself in a condition that makes it difficult to either design or maintain class objects that need to equate or compare to themselves or to object. Since your use case is actually unknown, I cannot give practical examples. If you really need to compare all properties (let's assume this is the case), you can create a `GetHashCode()` method with high entropy ([examples](https://stackoverflow.com/q/263400/7444103)). -- Note that Reflection is not as slow as sometimes advertised. – Jimi Sep 13 '21 at 22:48

1 Answers1

1

This behavior is because of the difference in .NET between "Reference" types and "Value" types. The fundamental philosophy of these is that for "Reference" types, object identity takes precedence over contents (that is, two different object instances with the same contents are still considered distinct), while for "Value" types, the contents are the only thing that matters.

In VB, Class denotes a reference type while Structure denotes a value type. Their respective behaviors are what you would expect, then: by default, Equals on a Class is equivalent to ReferenceEquals, checking to see if the references are the same, and GetHashCode returns a value based on the object identity. Equals on a Structure does member-wise value equality, and GetHashCode returns a value based on the hash codes of the members.

There are a couple of different options for overriding the default behavior, with differing impacts and levels of intrusiveness.

You can change Class to Structure. If you do so, I would strongly recommend to eliminate any mutable behavior on them (i.e. make all fields and properties ReadOnly), because mutable Structures can be extremely hard to reason about correctly. If you really do have immutable data, though, this is the easiest to maintain because .NET will already do what you want, you don't have to maintain your own Equals or GetHashCode override.

You can override GetHashCode and Equals on your Class to act like the Structure versions. This won't change anything else about your class, but it will make it act like a value type for the purposes of containers and sequences. If you're worried about maintenance, an alternative would be to do something reflection-based, though this shouldn't be used for anything that will be high-throughput because reflection is generally not particularly performant.

I believe the hashing and ordering containers take optional constructor parameters that will let you provide a class for overriding the behavior of the contents without altering the Class itself. You could do something like this. I'd recommend to look at the MSDN docs for HashSet.

Craig
  • 2,248
  • 1
  • 19
  • 23
  • I do not have particular performance requirements for this, because I won't be comparing thousands of objects, but I do need my objects to be mutable, because I will need to change their properties in runtime... So I assume the most optimal way that doesn't require maintenance and specific implementations for every class is reflection? Are there any decent examples of this? – Justinas Rubinovas Sep 13 '21 at 22:09
  • 1
    @JustinasRubinovas I don't know of any specifically, but the general strategy would be iterating over the fields and/or properties of the type, getting the values, calling `GetHashCode` on them, and combining the hash codes. I think .NET has even added a convenience method for combining hash codes, though I don't remember what it is (might be .NET Core or .NET 5 only); typically it involves multiplying by a constant and then using `Xor`. Similarly for `Equals`, iterating, getting values, calling `Equals` against current instance. – Craig Sep 14 '21 at 12:54