7

I have to compare a object with the raw properties of the same class. Meaning, i have to compare those:

struct Identifier
{
    string name;
    string email;
}

with the two strings name and email. I know i could just create a new Identifier instance for name and email and pass that into equals(). My application has to be very fast and resource-saving.

I know that comparison via hashcode isn't a good way, because as explained here there are collisions. But collisions are okay for me, i just need it to be fast.

So,

1) is comparison via GetHashCode (check if the hashcode of both objects are the same) faster than Equals()?

2) Should i instead creating a new instance of Identifier of the two values for the comparison, make a new method which takes the values directly? e.g.

struct Identifier {
  string name;
  string email;

  bool Equals(string name, string email) {
      // todo comparison via hashcode or equals
  }
}

I would use the Equals() and GetHashCode() method generated by resharper.

Community
  • 1
  • 1
michidk
  • 351
  • 1
  • 4
  • 18
  • GetHashCode is not for equality comparisons, it is for, well, getting hash code. Hashcode in C# is 32 bits of information, while your strings may technically contain unlimited amount of that. So quite different strings may have the same hashcode. Recommended reading - http://stackoverflow.com/questions/371328/why-is-it-important-to-override-gethashcode-when-equals-method-is-overridden – Eugene Podskal Mar 08 '16 at 11:30
  • Does C# cache the hashCode for a String? Because if you have to calculate it on the fly, that will be slower than comparing two strings. – Thilo Mar 08 '16 at 11:37
  • @Thilo [No, it doesn't](http://referencesource.microsoft.com/#mscorlib/system/string.cs,0a17bbac4851d0d4). But calculating hash codes for two strings is extremely fast, and you should definitely use them in your own implementation of `GetHashCode`. – Yuval Itzchakov Mar 08 '16 at 11:40
  • in your implementation of GetHashCode: of course. But no need to call it during Equals, right? – Thilo Mar 08 '16 at 11:42
  • It is not the way it works, code *might* use GetHashCode() to make a comparison faster. But then *still* needs to use Equals() since hash codes cannot be unique. – Hans Passant Mar 08 '16 at 11:43

2 Answers2

6

is comparison via GetHashCode (check if the hashcode of both objects are the same) faster than Equals()?

You seem to be confusing the two concepts. GetHashCode's purpose isn't to seek equality between two object instances, it is there simply so each object can easily provide a hashcode value for any external resources that may relay on it.

Equals, on the other hand, is there to determine equality. It should be that two methods which yield true for equals, provide the same hashcode, but not the other way around.

The documentation on object.GetHashCode provides a pretty good explanation:

Two objects that are equal return hash codes that are equal. However, the reverse is not true: equal hash codes do not imply object equality, because different (unequal) objects can have identical hash codes. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value this method returns may differ between .NET Framework versions and platforms, such as 32-bit and 64-bit platforms. For these reasons, do not use the default implementation of this method as a unique object identifier for hashing purposes. Two consequences follow from this:

  • You should not assume that equal hash codes imply object equality.
  • You should never persist or use a hash code outside the application domain in which it was created, because the same object may hash across application domains, processes, and platforms.

If you want to check for equality between two instances, I definitely recommend implementing IEquatable<T> and overriding object.GetHashCode.

As a side note - I see that you're using a struct. You should note that struct has different semantics in C# than in C++ or C, and I hope you're aware of them.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
6

Comparing hash codes could be faster if you save them on the Identifier instance (see below). However, it is not the same thing as comparing for equality.

Comparing hash codes lets you check if two items are definitely not equal to each other: you know this when you get different hash codes.

When hash codes are equal, however, you cannot make a definitive statement about the equality: the items could be equal or not equal to each other. That is why hash-based containers must always follow hash code comparison, direct or indirect, with a comparison for equality.

Try implementing the comparison like this:

struct Identifier {
    string name;
    string email;
    int nameHash;
    int emailHash;
    public Identifier(string name, string email) {
        this.name = name;
        nameHash = name.GetHashCode();
        this.email = email;
        emailHash = email.GetHashCode();
    }
    bool Equals(string name, string email) {
        return name.GetHashCode() == nameHash
            && email.GetHashCode() == emailHash
            && name.equals(this.name)
            && email.equals(this.email);
    }
}

Comparing to pre-computed hash code would short-circuit the actual equality comparison, so you could save some CPU cycles when most of the comparisons end up returning false.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 1
    Why is calculating two hash codes faster than comparing two strings? Both seem to be have to iterate over the string (and comparison can short-circuit on the first mismatched character). – Thilo Mar 08 '16 at 11:45
  • 1
    @Thilo It is not faster, indeed. To see if 2 objects are equal, you call the Equals method and that's all. GetHashCode is only there to "sort" (kinda) your object in hash collections (HashSet, Dictionary...), in order to be able to find it with a O(1) complexity. – krimog Mar 08 '16 at 12:09
  • 1
    @Thilo I was certain that C# cached string hash codes the way Java does, but quick examination of the source told me that I was wrong on that: .NET designers went for saving memory. Computing hash codes has slight advantage in terms of CPU cache because the code reads from sequential locations, so most reads would be cache hits. However, this would be relevant only for very long strings, and even then the impact would be small. Anyway, I changed the wording of the answer, and suggested an implementation that caches hash codes explicitly. – Sergey Kalinichenko Mar 08 '16 at 12:24