0

I am trying to generate hash key using SHA-256 which is producing 64 length String but i need key of size<=32, what is the best algorithm recommended maintaining uniqueness? Please advice.

Anil Konduru
  • 878
  • 10
  • 18
  • Can't you use substring for length 32? – Shivam Nov 23 '15 at 13:24
  • 1
    SHA-256 produces hashes of 32 bytes (256 bit). I suspect you shouldn't encode the hash to Hex. – Artjom B. Nov 23 '15 at 13:24
  • 3
    @Shivam substring a hash key it's a good way to create collisions. – Averroes Nov 23 '15 at 13:34
  • @Averroes sorry I didn't thought about that. Substring is bad idea. – Shivam Nov 23 '15 at 13:51
  • 3
    less key space = more collisions. Assuming near perfect distribution that should mean that half a sha256 collides exactly as often as a full sha128 (or MD5 with 128 bits) would (if that was a thing). Obviously makes more sense to try to cram more bits into the encoded representation than to remove bits but substrings aren't bad beyond what they obviously do. related: http://stackoverflow.com/questions/18134627/how-much-of-a-git-sha-is-generally-considered-necessary-to-uniquely-identify-a (sha1 hex encoded substring) – zapl Nov 23 '15 at 13:51
  • As a hash function SHA256 says that would be very hard to find a collission if you take into account the full hash output. The same cannot be assured if you only take half the bits of the output. – Averroes Nov 23 '15 at 13:59
  • If you XOR the first 128 bits with the last 128 bits then you will get a 128 bit output which involves all 256 bits of the input. Your shorter key will have the inevitable problems due to its shorter length, but all the 256 input bits are involved in the output. – rossum Nov 23 '15 at 15:23
  • 1
    @Averroes: what gives you the impression that the first 32 bits of a 64-bit hash function are any worse than a purpose built 32-bit hash function? – Louis Wasserman Nov 23 '15 at 19:15
  • @LouisWasserman I think I maybe was nitpicking. The contract of the hash function for no collisions includes the full 64 bits in this case. For a given hash function you might be encountering more collisions in the first 32 bit of a 64 bits function than in one purposedly built 32 bit hash function IIANM. – Averroes Nov 24 '15 at 07:53

1 Answers1

1

As already indicated you loose collision resistance for each bit you drop. Hashes however are considered to be indistinguishable from random. Because of the avalanche effect, each bit of the input is "represented" by each of the bits in the hash. So you can indeed simply use the first 128 bits / 16 bytes of the output of the hash. That would still leave you with some 64 bit of collision resistance. The more or less standard way to do this is to take the leftmost bytes.

Additional hints:

  • To have some additional security, use the result of a HMAC with a static, randomly generated 128 bit key instead of the output of a hash;
  • Of course you could also encode the hash in base 64 and use that, if you can store any string instead of only hexadecimals. In that case you can fit 32 * 6 = 192 bits into the value, which would result in higher security than a SHA-1 hash (which is considered insecure nowadays, keep to SHA-2).
Maarten Bodewes
  • 90,524
  • 13
  • 150
  • 263