Complexity of String.GetHashCode()

Question

Possible Duplicate:
Why might a System.String object not cache its hash code?

I always thought that, given that .Net strings are immutable, String.GetHashCode() didn't have to calculate the hash every time it's called -- if the characters won't change, the hash is constant for a given instance of System.String, I naively thought; String.GetHashCode() could have O(1) complexity.

Reverse engineering it shattered this assumption.

Of course, hash codes aren't meant to be constant and so on, but what could prevent a String implementation from having hash codes already calculated from, say, construction time?

Apart from a bit of complexity and 4 bytes of memory, nothing. — CodesInChaos, Jul 18 '12 at 20:44
I sensed that this question was too good to remain unasked to this day. Unfortunately, it never showed up while I wrote it... I just voted to close this. — Humberto, Jul 18 '12 at 20:47

score 3 · Accepted Answer · edited May 23 '17 at 12:27

Good question!

I asked the same thing a while back.

Basically, it's a speed/memory trade-off. The benefit of caching string hash codes is arguably outweighed by the overhead of every single string object requiring another 32 bits of memory to be allocated. This makes sense when you think about the large number of strings that may exist in a program versus the number whose hash codes you care about (presumably because you're using them as keys).

The latter number may be large in some programs, but it may also be quite small. It could even be zero in a lot of cases.

If performance were an extreme concern for you in certain scenarios, you might consider writing your own wrapper that does cache its hash code:

public class StringKey
{
    string value;
    int hashCode;

    public StringKey(string value)
    {
        this.value = value;
        this.hashCode = value.GetHashCode();
    }

    public override int GetHashCode()
    {
        return this.hashCode;
    }

    public override string ToString()
    {
        return this.value;
    }

    // Plus all the other stuff you'd want to include here,
    // e.g., Equals, CompareTo, etc.
}

To get any benefit out of this, of course, you'd still need to be very careful to reuse these StringKey objects basically throughout your program everywhere. In the vast majority of cases, this would not be worth the effort; I've included the idea only as something to consider if you happen to be an exceptional case.

score 0 · Answer 2 · answered Jul 18 '12 at 20:44

0

That would only make sense if you assume that the hash code is used (almost) every time you create a string. If you don't use the hashcode you'd still pay the penalty of calculatinging. I grant you for interned strings this might actually be something that was worthwhiole as long as it could be done as part of the interning.

answered Jul 18 '12 at 20:44

Rune FS

21,497
7
62
96

Making the calculation lazy would be easy. – CodesInChaos Jul 18 '12 at 20:45
@CodesInChaos yes but that's not what OP suggests and my answer is within the context of the question – Rune FS Jul 19 '12 at 06:59

score 0 · Answer 3 · answered Jul 18 '12 at 20:45

0

I think the problem is where to store the hash code. There's so much optimization being performed on string storage as it is that adding more storage requirements would be overly complex.

answered Jul 18 '12 at 20:45

Chris Gerken

16,221
6
44
59

Complexity of String.GetHashCode()

3 Answers3