46

The following code produces the output of 46104728:

using System;

namespace TestApplication
{
    internal static class Program
    {
        private static void Main()
        {
            Type type = typeof(string);
            Console.WriteLine(type.GetHashCode());
            Console.ReadLine();
        }
    }
}

But so does this:

using System;

namespace TestApplication
{
    internal static class Program
    {
        private static void Main()
        {
            Type type = typeof(Program);
            Console.WriteLine(type.GetHashCode());
            Console.ReadLine();
        }
    }
 }

Yet on http://ideone.com it produces varying results for each type. This issue has been reproduced on more than one system now. I'm using .NET 4.0 right now.

Jehof
  • 34,674
  • 10
  • 123
  • 155
Michael J. Gray
  • 9,784
  • 6
  • 38
  • 67
  • +1, Interesting. While they dont match for me, the results seem to sometimes not consistent. – leppie Nov 18 '11 at 05:29
  • I do not share the same behavior on .Net 4.0, nor 3.5, nor 2.0, when you look at their hash codes at the same time. It appears that Type hashcodes start at one value and are based on the order of their use or appearance (although of that I'm not sure). – user7116 Nov 18 '11 at 05:29
  • @sixlettervariables are you in debug or release mode and are you testing it with a debugger (VS) attached? – Michael J. Gray Nov 18 '11 at 05:33
  • 3
    Consider reading [GetHashCode guidelines](http://blogs.msdn.com/b/ericlippert/archive/2011/02/28/guidelines-and-rules-for-gethashcode.aspx) – V4Vendetta Nov 18 '11 at 05:33
  • @V4Vendetta: Irrelevant, we are dealing with internally-implemented hashcodes. – leppie Nov 18 '11 at 05:36
  • 3
    @leppie, not irrelevant, Eric says very clearly what the use case of GetHashCode is and that you shouldn't rely on an implementation across time and systems, see rule 3. – jk. Nov 18 '11 at 09:58

4 Answers4

46

You've run into what you believe to be a problem, however, if you were to look at their hash codes in the same execution you'll find that they're not identical but instead rely on their order of usage:

Console.WriteLine("{0} {1:08X}", typeof(string), typeof(string).GetHashCode());
Console.WriteLine("{0} {1:08X}", typeof(Program), typeof(Program).GetHashCode());
// System.String 02BF8098
// Program 00BB8560

If I run that same program again, swapping their order:

Console.WriteLine("{0} {1:08X}", typeof(Program), typeof(Program).GetHashCode());
Console.WriteLine("{0} {1:08X}", typeof(string), typeof(string).GetHashCode());
// Program 02BF8098
// System.String 00BB8560

This is a non-issue at runtime as the returned values do not violate the rules for implementing Object.GetHashCode.

But, as you noted this behavior seems curious!

I delved into the source and found the implementation of Type.GetHashCode is foisted off onto MemberInfo.GetHashCode, which is again foisted off onto Object.GetHashCode which calls RuntimeHelpers.GetHashCode(this).

It is at this point that the trail goes cold, however, my assumption is the inner workings of that method creates a new value, mapped per instance, based on the order of calls.

I tested this hypothesis by running the same code above with two instances of Program (after adding a property to identify them):

var b = new Program() { Name = "B" };
var a = new Program() { Name = "A" };
Console.WriteLine("{0} {1:08X}", a.Name, a.GetHashCode());
Console.WriteLine("{0} {1:08X}", b.Name, b.GetHashCode());
// A 02BF8098
// B 00BB8560

Thus, for classes which do not explicitly override Object.GetHashCode, instances will be assigned a seemingly predictable hash value based on the order in which they call GetHashCode.


Update: I went and looked at how Rotor/Shared Source CLI handles this situation, and I learned that the default implementation calculates and stores a hash code in the sync block for the object instance, thus ensuring the hash code is generated only once. The default computation for this hash code is trivial, and uses a per-thread seed (wrapping is mine):

// ./sscli20/clr/src/vm/threads.h(938)
// Every thread has its own generator for hash codes so that we
// won't get into a situation where two threads consistently give
// out the same hash codes.
// Choice of multiplier guarantees period of 2**32
// - see Knuth Vol 2 p16 (3.2.1.2 Theorem A).

So if the actual CLR follows this implementation it would seem any differences seen in hash code values for objects are based on the AppDomain and Managed Thread which created the instance.

user7116
  • 63,008
  • 17
  • 141
  • 172
  • What makes me curious mostly is why it had the value I originally placed for my same assembly run on two different machines. They're in two distinct states so there must be something very deterministic about it as you mentioned in the other comments. – Michael J. Gray Nov 18 '11 at 06:09
  • @JNZ: eh, I'll bet the Framework has some seed value hardcoded, so it is always the same across your machines. – user7116 Nov 18 '11 at 06:10
  • That wouldn't make any sense because then it would be the same on two different systems with .NET 4.0. It's different on two distinct machines which are configured differently but running .NET 4.0 currently, they're not even both mine. – Michael J. Gray Nov 18 '11 at 06:12
  • Why not, what if the algorithm was simply `hash = hash + counter` and `hash = seed` as the initialization? – user7116 Nov 18 '11 at 06:14
  • Most hash algorithms start with a well known, hard coded, non-random seed. Read up on [Bob Jenkins' hash algorithms](http://burtleburtle.net/bob/hash/doobs.html), he has lots of nice details inside. I'm certain it starts from a fixed seed because `46104728` is `0x02BF8098`, coincidentally the same hash code I get. – user7116 Nov 18 '11 at 06:20
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/5121/discussion-between-jnz-and-sixlettervariables) – Michael J. Gray Nov 18 '11 at 06:24
9

Program (.NET 4, AnyCPU):

var st = typeof(string);
var pt = typeof(Program);
Console.WriteLine(st.GetHashCode());
Console.WriteLine(pt.GetHashCode());
Console.WriteLine(typeof(string).GetHashCode());
Console.WriteLine(typeof(Program).GetHashCode());
Console.ReadLine();

Run 1:

33156464
15645912
33156464
15645912

Run 2-6:

45653674
41149443
45653674
41149443

Run 7:

46104728
12289376
46104728
12289376

Run 8:

37121646
45592480
37121646
45592480

While I can understand the randomness as long as the hashcode is consistent during the program lifetime, it bothers me that it is not always random.

leppie
  • 115,091
  • 17
  • 196
  • 297
  • Interesting work really. On my system, it appears the first call to `System.Type.GetHashCode` is always the same between runs. However, subsequent calls make it different for different `Type` instances (on different types of course). I believe it's seeded somehow and produces a biased result. – Michael J. Gray Nov 18 '11 at 05:43
  • 3
    @JNZ: the answer is `Type` doesn't override `GetHashCode` and thus uses the default implementation, which assigns them in a pseudo-deterministic fashion based on the order in which `GetHashCode` is called. – user7116 Nov 18 '11 at 05:52
  • 2
    @JNZ: You should give the other answerer the 'tick'. Mine was just my own observations without trying to answer the question :) – leppie Nov 18 '11 at 05:53
  • That was a misplaced click honestly. I didn't mean for anyone to get it yet heh. Simply +1'd you for the information that lead me on an investigative hunt. – Michael J. Gray Nov 18 '11 at 06:06
2

This is a surprising result, with a relatively simple explanation.

The class Type uses the default implementations for Equals and GetHashCode for object. Specifically, Type instances are equal when they are the same instance (i.e. at the same memory address). Similarly, when the objects are the same instance, their hash codes will be equal.

typeof uses caching, so for a given type, it will always return the same instance, which mimics the behavior of member equality, but it is not:

object.ReferenceEquals(typeof(string), typeof(string)) == true

As for the original question, this result can hold for any reference type that does not override GetHashCode. There is no reason why the output of GetHashCode should be random, it needs only be different for objects at different memory addresses (and well distributed in the output range). If memory addresses are assigned sequentially from the same starting point, the sequence of hash codes generated from those objects will also be the same.

I should add that I do not know the actual base implementation of GetHashCode, I am merely theorizing that it would be sensible for it to derive from the memory address of the reference type.

Codure
  • 754
  • 1
  • 7
  • 12
-1

In response to Eric Ouellet's answer, I'll not even comment on the incorrect syntax (oops, guess just I did), but that information is factually inaccurate.

Results from the C# interactive console in Visual Studio prove GetHashCode() works as expected on generic types.

Witness:

> typeof(List<int>).GetHashCode()
42194754
> typeof(List<string>).GetHashCode()
39774547
> typeof(Stack<string>).GetHashCode()
59652943
> typeof(Stack<int>).GetHashCode()
5669220