2

What would be the best string hashing function for say filename like strings? The strings would be similar to:

pics/test.pic
maps/test.map
materials/metal.mtl
Tom Tetlaw
  • 83
  • 1
  • 10
  • 21

3 Answers3

12

If the nature of data to be hashed doesn't require any fancy hashing algorithms, like the nature of textual strings, you may want to try the FNV hashing function. The FNV hash, short for Fowler/Noll/Vo in honor of the creators, is a very fast algorithm that has been used in many applications with wonderful results, and for its simplicity, the FNV hash should be one of the first hashes tried in an application.

unsigned int fnv_hash (void* key, int len)
{
    unsigned char* p = key;
    unsigned int h = 2166136261;
    int i;

    for (i = 0; i < len; i++)
        h = (h*16777619) ^ p[i];

    return h;
}

Or roll with MD5 algorithm instead, which is general-purpose and thus covers your needs quite well.

Desmond Hume
  • 8,037
  • 14
  • 65
  • 112
  • 3
    @DeadMG Why should a hash function be specialized just for text when, by its very essence, it can handle any kind of data and be reused for anything else without adapting the implementation for each purpose? – Desmond Hume Jul 10 '12 at 13:53
  • @DeadMG Ironically, just before you gave this answer of mine a -1, I upvoted your answer here http://stackoverflow.com/questions/3694899/c-template-and-inline-question which used to have -1 votes before that. – Desmond Hume Jul 10 '12 at 13:59
  • 5
    +1 to offset the -1 as the void * would allows it to work with any data not just a string. Although a const void * const key would be better. – rxantos May 28 '16 at 13:59
0

There is no universally "best" hashing function independently of how the hash are used.

Let's suppose you want to have a 32 bits int in order to use a small hash table in memory.

Then you can use the FNV-1a algorithm:

hash = offset_basis
for each octet_of_data to be hashed
 hash = hash xor octet_of_data
 hash = hash * FNV_prime
return hash

If your purpose is to be confident about the fact that two paths give different hash, then you can use the SHA1 algorithm.

If you want to be sure it's very hard to maliciously create collisions, then you can use SHA256.

Note that those 2 last algorithm generate long hash (longer than your typical path).

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • 1
    A cryptographic hash would most likely be overkill. OP could use boost::hash or std::hash in c++11 instead. – pg1989 Jul 10 '12 at 13:17
  • 1
    Overkill ? Do you know how and why this hash would be used ? – Denys Séguret Jul 10 '12 at 13:17
  • Yeah I do crypto research, so I'm pretty familiar with cryptographic hashes. SHA1 is an order of magnitude slower than even the slowest non-crypto hash, so he would be sacrificing speed for a collision resistance assumption that he most likely won't need. – pg1989 Jul 10 '12 at 13:19
  • 1
    I mentioned the use case. If OP just wants to build a hash table, FNV is good enough and is what you have in many small libraries (with variants). I commented that we have to know the purpose. Without the purpose, it's a little vain to try to choose an algorithm. – Denys Séguret Jul 10 '12 at 13:21
0

Just use std::hash<std::string>. That's your library implementer's idea of the 'best' general purpose, non-cryptographic hash function.

JoeG
  • 12,994
  • 1
  • 38
  • 63