4

Let's say you are planning to design a hash function which will generate keys between 0-256. Will using first 2 digits of MD5-digest be a great idea for a uniform distribution? What do you think on this? Is it expensive to md5() some word (2-10 letters)?

I know it is a rough definition of requirements but it would be great to discuss this.

ahmet alp balkan
  • 42,679
  • 38
  • 138
  • 214
  • I imagine there is no guarantee that a subset of an MD5 hash has a uniform distribution (similar to how GUIDs work). – David Nov 05 '10 at 18:46
  • Considering a multi-megabyte file can be MD5'ed like, well, "instantly" on modern hardware... however, with such a short input, uhh.. hmm. –  Nov 05 '10 at 18:47
  • Out of curiosity, why are you hashing two character strings to two character hashes? – user229044 Nov 05 '10 at 18:49
  • I agree with David. You're probably best off just writing up a quick test app that runs your design several thousand times so you can get an idea of the cost and statistical distribution. – Spencer Hakim Nov 05 '10 at 18:50
  • 2
    If you are looking for a 1 byte hash. Perhaps CRC8 will work better. – tidwall Nov 05 '10 at 18:52
  • What would you use if you need to generate a lets say [0, 255] valued hash function from strings, uniformly distributed as much as possible? – ahmet alp balkan Nov 05 '10 at 18:55

4 Answers4

4

There's no reason to use a cryptographic strength hash for something as simple as generating 3 digit hashes. You're better off using a more simple hash there.

I'm not certain specifically how expensive MD5 is relative to others, but there are plenty of better ways to create a small hash (see this article for some algorithm ideas).

Rafe Kettler
  • 75,757
  • 21
  • 156
  • 151
3

You could try calculating an 8-bit CRC.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
1

MD5 is designed to uniformaly spread the input over all the output bytes so it's as good as any other general hash function - sounds like a bit of overkill if you only want 256 values.

Note the output of MD5 is 128bytes (16bytes), it's only the text representation that is hex digits - so there is really no first two digits of MD5 - just use the bottom 8bits.

Martin Beckett
  • 94,801
  • 28
  • 188
  • 263
0

You haven't explained how you're going to use the hash, and what you're going to do with the collisions that are inevitable given that you have only 256 output values.

I think even MD5 (which is not cryptographically secure any more) is overkill for the likely applications.

I'd probably go with a CRC (cyclic redundancy check) algorithm that would generate a 16-bit or 32-bit number for you, and would likely give you good enough distribution.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278