3

let assume we have string name = "stackoverflow.com";

how to convert this string int to a unique ID or some sort of Hash. (no md5 because its too big) and it should not be random

i will like to have something like this

Please note string itself is too big I would like to know if the string can be written shorter. in an combination of letters, numbers and symbols

f¤k^§~7d?Æ

User6996
  • 2,953
  • 11
  • 44
  • 66
  • 3
    Is there anything preventing you from using the hash produced by `String.GetHashCode` ? – driis Jan 08 '11 at 12:58
  • **Any** hash is supposed to be **not unique**, just because it is a hash. And since you want something shorter than md5 (which is 16bytes, and you advice to get 10bytes length one) - be ready to get collisions. – zerkms Jan 08 '11 at 13:01
  • You'll get better answers if you'll explain _why_ you need this - for what purpose, how will it be used and by whom? – Allon Guralnek Jan 08 '11 at 13:43
  • Take look at this, it can be helpful: http://stackoverflow.com/questions/12933724/octcrypting-for-creating-unique-hash-codes – Gev Oct 17 '12 at 11:59

4 Answers4

7

This is impossible without restricting your domain. There are infinitely many strings and so can't be mapped injectively into any finite set. Therefore, uniqueness is impossible.

If you really want a unique identifier for string, use the string itself.

jason
  • 236,483
  • 35
  • 423
  • 525
  • well string itself is too big I would like to know if the string could be shorter written. in an combination of letters, numbers and symbols – User6996 Jan 08 '11 at 13:18
  • @Power-Mosfet: Again, without restricting your domain, this is impossible. There are more instances of `string` of length `n` than there are of length less than `n`. – jason Jan 08 '11 at 13:31
  • intput = "stackoverflow.com"; output= "81c74c7a"; intput = "stackoverflow.Com"; output= "b98a0a9a"; – User6996 Jan 08 '11 at 13:51
1

name.GetHashCode()

this is probably your best bet. It's a common problem with any form of hash that it can't be garenteed to be unique, but you can make it significantly more likely to be unique by allowing the hash to be longer.

You could also use differing hash algorithms in conjunction with each other to increase the supported range

EDIT

Then you could create a custom Hashcode function such as

public static int GetHashCode (string value )
{
int h = 0;
for (int i = 0; i < value.Length; i ++)
h += value [i] * 31 ^ value.Length - (i + 1);
return h;
}

(Stolen from elsewhere)

Kurru
  • 14,180
  • 18
  • 64
  • 84
  • `GetHashCode` is a bad idea because the algorithm can change breaking any maps between `string`s and `int`s. – jason Jan 08 '11 at 13:15
1

If you use a hash it needs to be sufficiently long to be unique and that's probably longer than what you want. You need 2^(BitLength/2) >> n with BitLength being the length of the hash and n the number of strings.

How about just using a Dictinary<string,int> and a counter instead?

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
  • There is no length that is "sufficiently long" to guarantee uniqueness. – jason Jan 08 '11 at 13:13
  • 1
    You can make the probability arbitrarily small by choosing longer hashes. And once it's smaller than the probability of an hardware error that's enough. A cryptographically secure 128 bit hash is enough, but about 22 chars long in Base64. – CodesInChaos Jan 08 '11 at 13:16
0

Jason is quite right -- you can't create a finite-sized unique hash of a string which can be arbitrarily long in size. I submit to you that what you are looking for is not a hash, but rather a compression algorithm on short strings.

Community
  • 1
  • 1
Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • Note that the same logic applies to lossless compression algorithms. There must be some strings for which their "compressed" version is longer than the string itself. – jason Jan 08 '11 at 15:02