How do you implement GetHashCode for structure with two string, when both strings are interchangeable

Question

I have a structure in C#:

public struct UserInfo
{
   public string str1
   {
     get;
     set;
   }

   public string str2
   {
     get;
     set;
   }   
}

The only rule is that UserInfo(str1="AA", str2="BB").Equals(UserInfo(str1="BB", str2="AA"))

How to override the GetHashCode function for this structure?

possible duplicate of [Fast String Hashing Algorithm with low collision rates with 32 bit integer](http://stackoverflow.com/questions/114085/fast-string-hashing-algorithm-with-low-collision-rates-with-32-bit-integer) — nawfal, Apr 14 '13 at 21:11
@nawfal, shouldn't it be the other way round? My question was posted on Sept 16/08, but the one you proposed was posted on Sept 22/08. — Graviton, Sep 24 '14 at 11:30
In the latest .Net(core) there is HashCode.Combine method -see https://stackoverflow.com/questions/23468671/what-is-the-best-way-to-implement-gethashcode-for-class-with-lots-of-propertie — Michael Freidgeim, Jan 09 '23 at 02:46
Does this answer your question? [What is the best algorithm for overriding GetHashCode?](https://stackoverflow.com/questions/263400/what-is-the-best-algorithm-for-overriding-gethashcode) — Michael Freidgeim, Jan 09 '23 at 02:49
@Graviton "Possible duplicate" is a way to clean-up - to close similar questions and keep one with the best answers. The date is not essential. See http://meta.stackexchange.com/questions/147643/should-i-vote-to-close-a-duplicate-question-even-though-its-much-newer-and-ha If you agree that it requires clarification please vote on http://meta.stackexchange.com/questions/281980/add-clarification-link-to-possible-duplicate-automated-comment — Michael Freidgeim, Jan 09 '23 at 02:51

score 71 · Accepted Answer · edited Sep 20 '10 at 09:45

71

MSDN:

A hash function must have the following properties:

If two objects compare as equal, the GetHashCode method for each object must return the same value. However, if two objects do not compare as equal, the GetHashCode methods for the two object do not have to return different values.

The GetHashCode method for an object must consistently return the same hash code as long as there is no modification to the object state that determines the return value of the object's Equals method. Note that this is true only for the current execution of an application, and that a different hash code can be returned if the application is run again.

For the best performance, a hash function must generate a random distribution for all input.

Taking it into account correct way is:

return str1.GetHashCode() ^ str2.GetHashCode()

^ can be substituted with other commutative operation

edited Sep 20 '10 at 09:45

Motti

110,860
49
189
262

answered Sep 16 '08 at 08:32

aku

122,288
32
173
203

Shouldn't that be return str1.GetHashCode() ^ str2.GetHashCode(); – roomaroo Sep 16 '08 at 08:49
4

Also, doesn't consider null values. – Omer van Kloeten Sep 16 '08 at 09:12
15

Omer van Kloeten, is should be obvious to any .net developer. quick sample intended to show general idea, not complete solution – aku Sep 16 '08 at 12:34
2

If you expect having str1,str2 and str2,str1 in your hash to be a very frequent occurance , lookup speed might be a bit slower than it should be. Lookup speed can also be increased by caching the hashcode. Obviously these may be premature optimizations. – Brian Jun 22 '09 at 15:19
+1 for pointing out the importance of using a commutative operation – Pandincus Oct 28 '09 at 20:05
what we can say for this string s1 = "hello"; string s2 = "hello"; (s1 == s2).Dump(); (s1.GetHashCode() == s2.GetHashCode()).Dump(); – shahjapan Dec 08 '09 at 12:32
Why does it need to be commutative? Then if you had two objects like: `o1.str1 == o2.str2 && o1.str2 == o1.str1` they would have the same hash code but may not be considered equal, or am I missing something? – user12345613 May 30 '12 at 14:06
Unfortunately, "If two string objects are equal, the GetHashCode method returns identical values. However, there is not a unique hash code value for each unique string value. Different strings can return the same hash code." per http://msdn.microsoft.com/en-us/library/system.string.gethashcode.aspx, so there's a chance that this would fail spectacularly. – Quanta Jul 31 '12 at 04:18
I stand corrected, Hashes can be the same for different objects. I'm guessing that all of the LINQ methods that use getHash, for example, would also use Equals in the case that two objects return the same hash. – Quanta Jul 31 '12 at 04:24

score 27 · Answer 2 · edited May 23 '17 at 10:30

27

See Jon Skeet's answer - binary operations like ^ are not good, they will often generate colliding hash!

edited May 23 '17 at 10:30

Community

1
1

answered Jun 22 '09 at 15:16

Tomáš Kafka

4,405
6
39
52

8

but jon says it's bad because it'll do exactly what OP wants. `F(a,b) == F(b,a)` ... – Noctis Jun 01 '17 at 00:53

score 16 · Answer 3 · 2008-09-21T13:58:29.360

public override int GetHashCode()
{
    unchecked
    {
        return (str1 ?? String.Empty).GetHashCode() +
            (str2 ?? String.Empty).GetHashCode();
    }
}

Using the '+' operator might be better than using '^', because although you explicitly want ('AA', 'BB') and ('BB', 'AA') to explicitly be the same, you may not want ('AA', 'AA') and ('BB', 'BB') to be the same (or all equal pairs for that matter).

The 'as fast as possible' rule is not entirely adhered to in this solution because in the case of nulls this performs a 'GetHashCode()' on the empty string rather than immediately return a known constant, but even without explicitly measuring I am willing to hazard a guess that the difference wouldn't be big enough to worry about unless you expect a lot of nulls.

score 5 · Answer 4 · answered Sep 16 '08 at 17:29

As a general rule, a simple way to generate a hashcode for a class is to XOR all the data fields that can participate in generating the hash code (being careful to check for null as pointed out by others). This also meets the (artificial?) requirement that the hashcodes for UserInfo("AA", "BB") and UserInfo("BB", "AA") are the same.
If you can make assumptions about the use of your class, you can perhaps improve your hash function. For example, if it is common for str1 and str2 to be the same, XOR may not be a good choice. But if str1 and str2 represent, say, first and last name, XOR is probably a good choice.

Although this is clearly not meant to be a real-world example, it may be worth pointing out that: - This is probably a poor example of use of a struct: A struct should normally have value semantics, which doesn't seem to be the case here. - Using properties with setters to generate a hash code is also asking for trouble.

Hmm, why do you think his struct doesn't have value semantics? And could you expand on your last sentence? — Stefan Monov, Jul 10 '10 at 16:15

score 4 · Answer 5 · answered Feb 06 '14 at 13:21

Going along the lines ReSharper is suggesting:

public int GetHashCode()
{
    unchecked
    {
        int hashCode;

        // String properties
        hashCode = (hashCode * 397) ^ (str1!= null ? str1.GetHashCode() : 0);
        hashCode = (hashCode * 397) ^ (str2!= null ? str1.GetHashCode() : 0);

        // int properties
        hashCode = (hashCode * 397) ^ intProperty;
        return hashCode;
    }
}

397 is a prime of sufficient size to cause the result variable to overflow and mix the bits of the hash somewhat, providing a better distribution of hash codes. Otherwise there's nothing special in 397 that distinguishes it from other primes of the same magnitude.

This hash code does not satisfy OP's requirement: the only rule is that UserInfo(str1="AA", str2="BB").Equals(UserInfo(str1="BB", str2="AA")) — Kasper van den Berg, Nov 17 '17 at 09:17

score 4 · Answer 6 · answered May 08 '14 at 14:44

4

A simple general way is to do this:

return string.Format("{0}/{1}", str1, str2).GetHashCode();

Unless you have strict performance requirements, this is the easiest I can think of and I frequently use this method when I need a composite key. It handles the null cases just fine and won't cause (m)any hash collisions (in general). If you expect '/' in your strings, just choose another separator that you don't expect.

answered May 08 '14 at 14:44

Daniel Lidström

9,930
1
27
35

Very simple indeed. This can be simplified in C# 6.0 to just `return $"{str1}/{str2}".GetHashCode();`. See [String Interpolation](https://msdn.microsoft.com/en-us/library/dn961160.aspx) – styfle Feb 18 '16 at 17:56
Not safe, what if str1 = "a/b" and str2 = ""? This would have the same hash as str1 = "a" and str2 = "b/". – Erwin Mayer Nov 22 '16 at 07:31
1

@ErwinMayer use a separator character you know isn't in your strings. Besides, GetHashCode is not required to always return unique values. It is used as an optimization to avoid calling `Equals` too often (exact comparison is often more expensive). – Daniel Lidström Nov 25 '16 at 08:06
How does this ensure that it results in the same hash code for str1="a", str2="b" and for str1="b" str2="a"? Is there some magic so that "a/b" and "b/a" result in the same hash? – Kasper van den Berg Nov 17 '17 at 09:13
@KaspervandenBerg No those two must have different hash since they are aren't the same, right? – Daniel Lidström Nov 17 '17 at 11:33
@DanielLidström they should be the same, since OP required UserInfo(str1: "AA", str2: "BB").Equals(UserInfo(str1: "BB", str2: "AA")) – Kasper van den Berg Nov 18 '17 at 10:27
@KaspervandenBerg Then the `Equals` function needs to be overridden and provide that functionality. That has not so much to do with `GetHashCode`. – Daniel Lidström Nov 20 '17 at 07:16
@DanielLidström, it does have much to do with GetHashCode, since GetHashCode is required to result in the same hashcode when two objects are Equal. – Kasper van den Berg Nov 20 '17 at 10:14
@KaspervandenBerg Nice catch, forgot about that. Guess you'll have to account for that then. – Daniel Lidström Nov 20 '17 at 13:59

score 3 · Answer 7 · answered Sep 16 '08 at 09:12

3

public override int GetHashCode()   
{       
    unchecked      
    {           
        return(str1 != null ? str1.GetHashCode() : 0) ^ (str2 != null ? str2.GetHashCode() : 0);       
    }   
}

answered Sep 16 '08 at 09:12

user11556

31
1

7

Why unchecked? xor can't overflow. – Konrad Rudolph Sep 16 '08 at 12:43

score 2 · Answer 8 · answered Sep 16 '08 at 08:33

Ah yes, as Gary Shutler pointed out:

return str1.GetHashCode() + str2.GetHashCode();

Can overflow. You could try casting to long as Artem suggested, or you could surround the statement in the unchecked keyword:

return unchecked(str1.GetHashCode() + str2.GetHashCode());

score 1 · Answer 9 · answered May 10 '21 at 11:48

1

Since C# 7, we can take advantage of ValueTuple for that:

return (str1, str2).GetHashCode();

answered May 10 '21 at 11:48

Pablo Retyk

5,690
6
44
59

But are you sure (str1,str2).GetHashCode() is the same with (str2,str1).GetHashCode() ? – Graviton May 10 '21 at 12:35
this isn't a requirement, also using other algorithms, sometimes you do some manipulation to one of the fields (such as str1<<2) before XORing, so if the order were a problem, it would be a problem also using the other suggested methods. I think that as long as you are consistent with the order in the internal implementation it shouldn't be a problem – Pablo Retyk Jun 09 '21 at 10:33

score 1 · Answer 10 · answered Sep 16 '08 at 08:23

1

Try out this one:

(((long)str1.GetHashCode()) + ((long)str2.GetHashCode())).GetHashCode()

answered Sep 16 '08 at 08:23

Artem Tikhomirov

21,497
10
48
68

score 0 · Answer 11 · answered Sep 16 '08 at 08:22

0

Many possibilities. E.g.

return str1.GetHashCode() ^ str1.GetHashCode()

answered Sep 16 '08 at 08:22

VolkerK

95,432
20
163
226

score 0 · Answer 12 · answered Sep 16 '08 at 08:22

0

Perhaps something like str1.GetHashCode() + str2.GetHashCode()? or (str1.GetHashCode() + str2.GetHashCode()) / 2? This way it would be the same regardless of whether str1 and str2 are swapped....

answered Sep 16 '08 at 08:22

Mike Stone

44,224
30
113
140

score 0 · Answer 13 · answered Sep 16 '08 at 08:27

0

Sort them, then concatenate them:

return ((str1.CompareTo(str2) < 1) ? str1 + str2 : str2 + str1)
    .GetHashCode();

answered Sep 16 '08 at 08:27

Steve Morgan

12,978
2
40
49

2

This will cause your GetHashCode method to do quite a lot of work. Hash codes are intended to be quick. From MSDN: "A hash function is used to quickly generate a number (hash code) that corresponds to the value of an object". Allocating a new string seems like a bad idea inside a hash function. – Wilka Nov 06 '08 at 10:30

Omer van Kloeten · Answer 14 · 2008-09-16T09:11:20.150

0

GetHashCode's result is supposed to be:

As fast as possible.
As unique as possible.

Bearing those in mind, I would go with something like this:

if (str1 == null)
    if (str2 == null)
        return 0;
    else
       return str2.GetHashCode();
else
    if (str2 == null)
        return str1.GetHashCode();
    else
       return ((ulong)str1.GetHashCode() | ((ulong)str2.GetHashCode() << 32)).GetHashCode();

Edit: Forgot the nulls. Code fixed.

edited Sep 16 '08 at 09:11

answered Sep 16 '08 at 08:31

Omer van Kloeten

11,800
9
42
53

1

The only rule is that UserInfo(str1="AA", str2="BB").Equals(UserInfo(str1="BB", str2="AA")) – alfred barthand Nov 24 '09 at 14:15

score -1 · Answer 15 · edited Dec 20 '11 at 01:11

-1

Too complicated, and forgets nulls, etc. This is used for things like bucketing, so you can get away with something like

if (null != str1) {
    return str1.GetHashCode();
}
if (null != str2) {
    return str2.GetHashCode();
}
//Not sure what you would put here, some constant value will do
return 0;

This is biased by assuming that str1 is not likely to be common in an unusually large proportion of instances.

edited Dec 20 '11 at 01:11

LPL

16,827
6
51
95

answered Sep 16 '08 at 08:56

Roger Willcocks

1,649
13
27

This does not satisfy the condition that the order of str1 and str2 does not matter. ("A", "B") and ("B", "A") produce different hashcodes. – Sebastian Negraszus Feb 05 '15 at 09:27
6.5 years later? And what condition are you referring to? This is the discussion of the generation of a hashcode for a struct containing 2 strings, not for what happens when comparing 2 strings. – Roger Willcocks Feb 06 '15 at 10:42
The structs ("A", "B") and ("B", "A") should be considered equal. Therefore, their hash codes must be equal. But ("A", "B") produces the hash code of "A", and ("B", "A") produces the hash code of "B" - which is not equal. – Sebastian Negraszus Feb 06 '15 at 11:46
Given that this question has been edited within the last 6 months at least, I'm not sure that was actually in this question originally. – Roger Willcocks Feb 08 '15 at 09:38

How do you implement GetHashCode for structure with two string, when both strings are interchangeable

15 Answers15

Linked