1

I am looking for advice for either finding or creating a hash algorithm to be used in .Net C#.

I have a collection of columns from a DB. The combination of columns across the table are guaranteed to produce unique strings.

Consider:

String Column1 = "StringA";
String Column2 = "StringB";
String Column3 = "StringC";

I concatenate the columns into a single string:

String ColumnKey = Column1 + Column2 + Column3;

Currently I'm using the built in .Net C# hash function from the string class.

int hashKey = ColumnKey.GetHashCode();

After doing some reading, it's my understanding that (although the probability is quite low) this algorithm does not guarantee uniqueness. It is also my understanding that this function could produce different results for the same string across different versions of the .Net Framework.

I am looking for another hash algorithm to use that would guarantee uniqueness and produce consistent results across different versions of .Net.

Can someone help get me started in the right direction?

JohnB
  • 3,921
  • 8
  • 49
  • 99
  • 3
    You can't. You just simply can't stuff any possible string into a 32-bit integer and guarantee uniqueness. – itsme86 Sep 19 '17 at 17:14
  • 1
    Use one of the standard SHA algorithms. But 32 bits is too low for a hash if you're worried about uniqueness. No hash is going to guarantee uniqueness but SHA256 for example makes the chances of a collision so low you shouldn't even think about it. This is a good read if you do want to think about it: https://stackoverflow.com/questions/4014090/is-it-safe-to-ignore-the-possibility-of-sha-collisions-in-practice – James Gaunt Sep 19 '17 at 17:15
  • 1
    The probability of collisions for a strings hash code isn't even low. By the time you've got just a few tens of thousands of records you've got a several percent chance of having a collision, and that's assuming the strings are well distributed, based on whatever its hash algorithm is designed to account for. – Servy Sep 19 '17 at 17:15
  • What are you using the hashes for? If this is for passwords or something, use SHA as another comment suggests. If this is for buckets in a data structure, `GetHashCode` is probably sufficient. One thing you'd need to look out for is something that degrades the hashes making it more likely to generate collisions. This can happen I think on your own class if the first few data members tend to be the same. In that case you'd want to override `GetHashCode`, but for strings, I don't think this would be a problem. – Dave Cousineau Sep 19 '17 at 17:20
  • @Servy - yep. 77,000 records gives a 50% chance of collision (hashing to a 32-bit int). – hatchet - done with SOverflow Sep 19 '17 at 17:26
  • @JohnB. What you desire is a "[Perfect Hash Function](https://en.wikipedia.org/wiki/Perfect_hash_function)". This is technically possible if you know all the values you will be hashing ahead of time. If that's not the case, then no, although if you hash to a large number of bits, the chances of collision become extremely unlikely. – hatchet - done with SOverflow Sep 19 '17 at 17:29
  • 2
    Why would you need a unique hash? – bashis Sep 19 '17 at 17:30

2 Answers2

4

It's impossible. There are 2^32 different values for an int, and a string just a few characters long has more possible values than this. As a result no hashing algorithm can guarantee a unique value for each string.

See the PigeonHole Principle. https://en.wikipedia.org/wiki/Pigeonhole_principle.

If you want a guaranteed hash for every version of .Net, implement the hash yourself. A fast hash function for string in C# gives a few examples. I would put it in an extension method for string.

Yair Halberstadt
  • 5,733
  • 28
  • 60
1

There is no such thing as a "guaranteed unique hash". Hash's have a size (in .NET 32-bit) so there are only 4-billionish possible hashes. Have more strings than that and you have to have a collision.

So what you are asking for is not possible.

BradleyDotNET
  • 60,462
  • 10
  • 96
  • 117