9

If one has an enumeration stored inside an aggregate type, one might want to include that inside the type's hash code (assuming a typical "multiply by primes" hash function). If one just calls SomeEnum.GetHashCode(), it appears that the JIT boxes the instance, even in release builds.

Profiling this shows some 10% of the time of my application spent boxing enumerations inside various GetHashCode functions.

Several value types implement IEquatable or similar interfaces, which allows calling GetHashCode as a static method; which avoids the boxing. But System.Enum doesn't provide the static overload of GetHashCode. Is there some means of computing the code that should be used but that avoids the boxing?

Community
  • 1
  • 1
Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 2
    Why bother at all? The enum is its own hash code. Just cast to int and call it a day. – Raymond Chen Dec 28 '12 at 01:55
  • @Raymond: I thought that might result in a bad distribution; but thinking over it again I'll see if it works. – Billy ONeal Dec 28 '12 at 01:59
  • @Raymond: That does indeed work for this particular test case. I'm leaving this open though for a while... – Billy ONeal Dec 28 '12 at 02:15
  • Assuming the enumeration has fewer than 4 billion members, casting to int is a perfect hash (no collisions). – Raymond Chen Dec 28 '12 at 03:49
  • @RaymondChen not necessarily. Consider, for example `enum E : long { V1 = 1, V2 = 0x100000001L }` – phoog Dec 28 '12 at 04:53
  • @Raymond: I thought the results were supposed to be uniformly distributed across the entire range of `int`. Apparently not, at least according to the perf numbers I'm getting now. – Billy ONeal Dec 28 '12 at 05:06
  • @phoog, good point and a good reason to call `GetHashCode()` as it will ensure that an `int` is always returned (even if the underlying enum type was not `int`). – jam40jeff Dec 28 '12 at 05:11
  • Have you tried Extension Methods? – Trisped Dec 28 '12 at 05:13
  • @jam40jeff casting the enum to an int will also ensure that an int is always returned. I was rebutting Raymond's assertion that there could be no collisions in an enum with < 4 billion members. This is very unlikely, though, so I'd probably just do what Raymond advises. If you're in the unlikely situation where it *does* cause problems, you could cast the enum to a long or ulong and call GetHashCode on that. – phoog Dec 28 '12 at 05:14
  • @BillyONeal, I believe hash codes are supposed to be uniformly spread for random data. Since hash codes must be consistent (the same enum value must return the same hash code every time), the spread of hash codes can only be as great as the spread of enum values. For the case of an enum with an underlying type of `int`, the hash codes will actually be as perfectly spread as the enum values as there is a 1-to-1 correspondence between the enum values and the hash codes. For an enum with 3 values, it wouldn't matter whether the set of possible hash codes was {0,1,2} or {0,1238,327254}. – jam40jeff Dec 28 '12 at 05:15
  • What static version of GetHashCode are you talking about? I'm unable to find it. – phoog Dec 28 '12 at 05:15
  • @phoog, I see what you mean, although hash code collisions are often unavoidable and acceptable as long as they are minimized. Also, I am used to putting "unsafe" casts such as long to int in a checked block, so I forgot that the simple cast to int will always work. – jam40jeff Dec 28 '12 at 05:19
  • @phoog I was assuming an enumeration with no gaps. There is no requirement that hash values be uniformly distributed, only that they be consistent, with a strong recommendation that they try to avoid collisions. Indeed, [the very first sample hash given in the documentation](http://msdn.microsoft.com/en-us/library/zdee4b3y) is one that hashes an integer to itself. – Raymond Chen Dec 28 '12 at 05:22

1 Answers1

6

You could cast to the underlying type of the enum (usually int unless the enum definition specifies otherwise) and use that type's overridden GetHashCode() method.

enum TestEnum
{
    Test1,
    Test2
}

TestEnum t = TestEnum.Test1;
((int)t).GetHashCode(); // no boxing
t.GetHashCode(); // boxing

Here is the IL for this code:

IL_0000:  nop
IL_0001:  ldc.i4.0
IL_0002:  stloc.0
IL_0003:  ldloc.0
IL_0004:  stloc.1
IL_0005:  ldloca.s   V_1
IL_0007:  call       instance int32 [mscorlib]System.Int32::GetHashCode()
IL_000c:  pop
IL_000d:  ldloc.0
IL_000e:  box        ConsoleApplication1.Program/TestEnum
IL_0013:  callvirt   instance int32 [mscorlib]System.Object::GetHashCode()
IL_0018:  pop
IL_0019:  ret

Edit: For completeness, I should point out that the body of int.GetHashCode() is simply return this;, so as Raymond Chen pointed out in a comment above, simply casting the enum to an int is good enough to obtain a hash code.

jam40jeff
  • 2,576
  • 16
  • 14
  • Does this in fact avoid the boxing? (e.g. doesn't `int.GetHashCode()` also result in boxing?) – Billy ONeal Dec 28 '12 at 01:48
  • Yes, this avoids boxing. Casting from an enum to an int avoids boxing (reference Jon Skeet's first comment: http://bytes.com/topic/c-sharp/answers/276556-enum-vs-constants-performance), and calling GetHashCode on an int does not cause boxing either since GetHashCode is overridden for this struct (and I even decompiled its implementation, and it does not do anything which would cause a boxing operation). Calling the Enum version of GetHashCode does in fact cause boxing (as you have already noted) because it calls an internal method which returns an `object`, then calls `GetHashCode()` on it. – jam40jeff Dec 28 '12 at 03:15
  • By the way, I had mistakenly checked the implementation of `GetHashCode()` for `short` before I posted. The implementation for `int` is simply `return this;`, so (as Raymond Chen already commented) you can just cast the Enum value to an `int` and use that as your hash code. – jam40jeff Dec 28 '12 at 03:24
  • `Enum` overrides `GetHashCode` too -- but boxing still happens. You'd see the boxing at the call site, not the target site. – Billy ONeal Dec 28 '12 at 03:24
  • @BillyONeal, you are correct that the boxing for the Enum call to `GetHashCode()` is happening at the call site. However, this seems to be because System.Enum is implemented as an abstract class inheriting from ValueType, whereas System.Int32 is a struct. I will post the IL from my compiled code to show you what is happening. – jam40jeff Dec 28 '12 at 04:58