7

While reading Jeffrey Richter's CLR via C# 4th edition (Microsoft Press), the author at one point states that while Object.Equals currently checks for identity equality, Microsoft should have implemented the method like this:

public class Object {
    public virtual Boolean Equals(Object obj) {
        // The given object to compare to can't be null
        if (obj == null) return false;

        // If objects are different types, they can't be equal.
        if (this.GetType() != obj.GetType()) return false;

        // If objects are same type, return true if all of their fields match
        // Because System.Object defines no fields, the fields match
        return true;
    }
}

This strikes me as very odd: every non-null object of the same type would be equal by default? So unless overridden: all instances of a type are equal (e.g. all your locking objects are equal), and return the same hash code. And assuming that the == on Object still checks reference equality, this would mean that (a == b) != a.Equals(b) which would also be strange.

I think the idea of things being equal if it is the exact same thing (identity) is a better idea than just making everything equal unless overridden. But this is a well known book's 4th edition published by Microsoft, so there must be some merit to this idea. I read the rest of the text but could not help but wonder: Why would the author suggest this? What am I missing here? What is the great advantage of Richter's implementation over the current Object.Equals implementation?

Daniel A.A. Pelsmaeker
  • 47,471
  • 20
  • 111
  • 157
  • While its implementation is not clear, the comment does explain properly? Objects are equal only if the fields match. – Karthik T Feb 06 '13 at 02:16
  • +1 for a fantastic question. I have read the book and honestly I also did not quite see the upside of implementing it this way. – Simon Whitehead Feb 06 '13 at 02:20

4 Answers4

5

The current default Equals() does what is known as a shallow compare (or reference compare), and then doesn't check any further if the references differ.

I think this is perfectly acceptable for a base implementation. I certainly wouldn't think that it is wrong or incomplete.

Richter's example1 which you quote is also perfectly legitimate for the base System.Object. The issue with his implementation is that it arguably should be declared abstract2 - with his method you will end up with an unreliable Equals() on derived objects if you do not override it (because Equals() is supposed to do a deep compare). Having to override this method on all derived objects would be a lot of work, therefore the Microsoft way is better as a default. So in essense you are correct: Richter's example is odd - it is better to default to not equal rather then the other way round (defaulting to true would lead to some rather interesting behavior if people forgot to override it).

(Just for easy reference, here is the default implementation as published in the book)

enter image description here



1: Richter is a smart man who knows his stuff and I wouldn't generally argue with anything he says. You have to understand that the MS engineers would have had to think long and hard about a lot of things, knowing that they didn't have the flexibility of being able to get it wrong and then just fix stuff later. No matter how right they are, people will always second guess them at a later date, and offer alternative opinions. That doesn't mean the original is wrong or the alternative is wrong - it simply means there was an alternative.

2: Which of course means that there would be no base implementation, which is good because it would have been unreliable.

slugster
  • 49,403
  • 14
  • 95
  • 145
3

Jeffery Richter is talking about Value equality over Identity equality.

Specifically you ask:

So unless overridden: all instances of a type are equal?

The answer is Yes, But... As in, Yes, But it is (almost) always supposed to be overridden.

Thus, for most Classes it should be overridden to do a attribute-by-attribute comparison to determine equality. For some other classes that are truly identity-based (like locks) it should be overridden to use the same technique as it uses today.

The key though is that it must be overridden in almost every case, and this alone is sufficiently difficult, clumsy and mistake-prone that it is probably why Microsoft did not use this approach.


What is the advantage of Value-Equality over Identity-Equality? It's that if two different objects have the same values/contents, then they can be considered "equal" for purposes of comparison in cases like the Keys of a Dictionary object.

Or consider the matter of strings in .Net, which are actually objects, but get treated a lot like values at higher-levels (especially in VB.net). This presents a problem when you want to compare two strings for equality, because 99% of the time you really do not care if they are different object instances, you only really care if they contain the same text. So .Net has to make sure that that is how string comparison actually works, even though they are really objects.

RBarryYoung
  • 55,398
  • 14
  • 96
  • 137
  • I actually meant: _What is the great advantage of Richter's implementation over the current Object.Equals implementation?_ – Daniel A.A. Pelsmaeker Feb 06 '13 at 10:07
  • 1
    @Virtlink The advantage is that it effectively *forces* people to override `Equals` in order to get equality that means anything, instead of having a default implementation that is rarely what is desired but appears to seem appropriate as is for new programmers. It comes down to that objects very rarely actually require reference equality, and almost always want to use value equality, but the default implementation is reference equality. – Servy Feb 06 '13 at 14:58
  • Especially since most languages have another operator/method specifically for reference/identity equality, like `IS`. – RBarryYoung Feb 06 '13 at 15:02
1

If one is asked to make a list of all identifiably-distinct objects of arbitrary types, and is not given any indication of what the objects are or what they will be used for, the only universally-applicable means of testing whether two references should be considered as pointing to identifiably-distinct objects is Object.Equals(Object). Two references X and Y should be considered identifiably-distinct if changing one or more references that presently point to X so that they instead point to Y would likely alter program behavior.

For example, if two instances of string both contain the entire text of War and Peace, punctuated and formatted identically, one could likely replace some or all references to the first with references to the second, or vice versa, with little or no effect on program execution beyond the fact that a comparison between two references which point to the same instance may be found to hold identical text much more quickly than could two references which point to different strings that contain identical characters.

In most cases, objects which exist to hold immutable data should be considered to be identical if the data they hold is identical. Objects which exist to hold mutable data, or which exist to serve as identity tokens, should generally be considered distinct from each other. Given that one can define a custom EqualityComparer which will regard as equivalent objects which are not totally equivalent (e.g. a case-insensitive string comparer), and given that code which needs some definition of equivalence which is broader than strict equivalence should generally know what types it is working with and what definition of equivalence is suitable, it is generally better to have Object.Equals report objects as being different unless they are designed to be substitutable (as would be, e.g., strings).

To use a real-world analogy, suppose one is given two pieces of paper, each with a Vehicle Identification Number written on it, and is asked if the car identified by the first piece of paper is the same as the car identified by the second. If the two slips of paper have the same VIN, then clearly the car identified by the first is the same as the one identified by the second. If they have different VINs, however, excluding any weird possibility of a car having more than one VIN, then they identify different cars. Even if the cars have the same make and model, options packages, paint scheme, etc. they would still be different cars. A person who bought one would not be entitled to arbitrarily start using the other instead. It may sometimes be useful to know whether two cars presently have the same options packages, etc. but if that's what one wants to know, that's what one should ask.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • So if I understand you correctly: unless it is logical for the purpose of the object to override `Equals` to do a value equality check, you agree that the default `Object.Equals` should not return `true` but do the reference equality check it currently does. – Daniel A.A. Pelsmaeker Feb 12 '13 at 21:44
  • @Virtlink: Precisely. One weakness in Java, which largely carries through into .net, is that there's no distinction between object references which are held to encapsulate the *identity* of their target, which ones encapsulate *mutable content* thereof, which ones encapsulate *both*, and which encapsulate *neither* [encapsulating just immutable content, other than identity]. The base type `Object` doesn't have any content--all it has is identity, so that's the only sensible basis for comparison. Other types like `string` have immutable content, but no strong identity. – supercat Feb 12 '13 at 21:53
  • @Virtlink: Objects of mutable types generally have a strong identity unless either (1) there only exists one persistent reference (something other than a local variable), anywhere in the universe, or (2) no reference will ever be held by anything that will mutate the object in question. Only if one of those conditions applies would it make sense to consider value equality rather than reference equality, and there's no way that a class type can know that either of them applies. Note, however, that #1 applies to unboxed value types, and #2 usually applies to boxed value types. – supercat Feb 12 '13 at 22:04
  • While identity may work as a default, the default equality comparison is apparently not really meaningful. Some even talk about making `Equals` abstract. I would even go as far as to state that `Object` should not have an `Equals` or a `GetHashCode`. Both should be in `IEquatable`. Any class should specify the custom equality that applies to it by implementing `IEquatable` and otherwise only a custom equality comparer can be used to compare objects (e.g. a `ReferenceEqualityComparer`). Edit: Hmm.. apparently [Jon Skeet](http://stackoverflow.com/a/3096126/146622) thinks so too. – Daniel A.A. Pelsmaeker Feb 12 '13 at 22:27
  • @Virtlink: `IEquatable` is broken with regard to any inheritable types. Further, I'm not sure why you think the default equality comparison is not meaningful. If objects implement `Equals` such that `X.Equals(Y)` implies `X` is substitutable for `Y`, then it's possible to write a generic interning `WeakReference` cache to expedite things like deserialization (if a newly-deserialized instance matches a pre-existing one, performance may be improved by discarding the new one and using the old one instead). The fact that some Framework types... – supercat Feb 13 '13 at 00:11
  • ...define `Equals` to mean something other than semantic equivalence complicates things a bit, but that doesn't mean that a universally-available method to test whether two references are equivalent would not be meaningful. – supercat Feb 13 '13 at 00:12
0

Guess: the current behavior of Object.Equals is not what most people consider to be "equal".

The main (only?) reason of this method to exist is to allow searching for items in collections by pretending to be "==" implementation. So in most practical cases this implementation behaves unexpectedly (except for the case when you want to find if particular instance is in the collection already) and you force to provide you custom comparison functions...

Likely it is method of Object because for technical reasons. I.e. for Array/Dictionary it may be faster to assume all objects have Equal/GetHash instead of checking something on object to enable "Find" functionality.

Arguably it should not be on Object at all and instead just require classes that can be stored in collections to implement some form of IComparable interface.

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • You mean `IEqualtable` not `IComparable`. Many objects can be equal or not equal without being able to determine which one is greater or less than the other. – Servy Feb 06 '13 at 14:59