1

I need to split data evenly across n nodes in a distributed cache.

The following code will take a cache key and determine which Node to use:

public static int GetNodeIDByCacheKey(string key)
{
    return Math.Abs(key.GetHashCode()) % TotalNumberOfNodes();
}

Unfortunatly the code isn't reliable across different machine instances. In testing it seems it will sometimes return a different Node for the same key.

Any thoughts or ideas on getting something to work better?

Andrew Harry
  • 13,773
  • 18
  • 67
  • 102
  • So you claim that for bitwise identical values of `key`, you get different function results? –  Jul 14 '11 at 03:55
  • it is what i'm currently suspecting is the issue i'm working through – Andrew Harry Jul 14 '11 at 03:56
  • 1
    possible duplicate of [GetHashCode() gives different results on different servers?](http://stackoverflow.com/questions/6114772/gethashcode-gives-different-results-on-different-servers) – Rick Sladkey Jul 14 '11 at 04:00
  • More info: This is happening in Azure Development with two instances of the web role running on the same computer – Andrew Harry Jul 14 '11 at 04:08

2 Answers2

4

You should not rely on the implementation of string's GetHashCode() other than the fact that strings of equal value will produce the same hash code - but what the particular value of the hash code will be is only required to be consistent as per the documentation for the current execution of an application - a different hash code can be returned if the application is run again.

Also the implementation of GetHashCode might be different if you have different .NET CLR versions on the machines in question:

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.

Instead you could just define a consistent mapping from your string key to a numeric value which would allow you to bin your nodes consistently across restarts and machine boundaries, this i.e. could be achieved by converting the string into a byte array (i.e using Encoding.UTF8.GetBytes() ) and then converting the byte array to a number (either using a lossy conversion using just 64 bits or i.e using BigInteger)

BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
  • I read this, but i'm running into this problem in the Azure Development Environment (two instances of the same web.role) – Andrew Harry Jul 14 '11 at 04:07
  • @Andrew: It shouldn't be too hard to define a mapping to a numeric value for your string keys yourself, there will always be collisions but you have to be able to guarantee that the same string value will always map to the same bin – BrokenGlass Jul 14 '11 at 04:12
2

A particular instance (an instantiated string) will generate the same hash, but two instances (like on Machine A and on Machine B) of the same string ("Hello" for instance) may very well have different hashCodes. I think you will need to implement your own hash function that uses only the contents of the strings if you want identical operation between machines and instances.