7

I am using boost::hash to get hash value for a string. But it is giving different hash values for same string on Windows 32-bit and Debian 64-bit systems.

So how can I get same hash value (32-bit or 64-bit) using boost::hash irrespective of platform?

Cyral
  • 13,999
  • 6
  • 50
  • 90
user2099350
  • 71
  • 1
  • 3
  • Hypothetically, what happens if you depend on always getting the same hash and boost slightly changes their algorithm? – Mark B Jul 02 '13 at 13:23
  • @Mark B, it may cause portability issues. You may want to collect hashed strings coming from different platforms into one data structure in the simplest case and bucket distribution is randomized – fatihk Jul 02 '13 at 13:26
  • 1
    Is it possible that in one of the instances you use unicode and in the other one you don't? – Bee Jul 02 '13 at 13:32
  • 2
    boost:hash(hash_value) returns std:size_t, so it return 64-bit long in 64-bit system, 32-bit long in 32-bit system. – onemouth Jul 02 '13 at 13:33
  • @onemouth, can size_t size cause such difference? – fatihk Jul 02 '13 at 13:36
  • @thomas Typical implementations will generate the hash in a `size_t`, counting on the modulo properties of unsigned arithmetic. The modulo used will thus depend on the size of `size_t`, and will definitely be different. – James Kanze Jul 02 '13 at 13:39
  • @onemouth Which is, of course, totally irrelevant here. More to the point, `size_t` is guaranteed to be unsigned, so the implementation of hash can count on modulo arithmetic. – James Kanze Jul 02 '13 at 13:40
  • 1
    I've just looked at the implementation of `boost::hash`. In practice, except for an empty string (which will hash to 0), you're almost guaranteed to get different results depending on the size of `size_t`. – James Kanze Jul 02 '13 at 13:52

3 Answers3

5

What is the guarantee concerning boost::hash? I don't see any guarantees that a generated hash code is usable outside of the process which generates it. (This is frequently the case with hash functions.) If you need a hash value for external data, valid over different programs and different platforms (e.g. for a hashed access to data on disk), then you'll have to write your own. Something like:

uint32_t
hash( std::string const& key )
{
    uint32_t results = 12345;
    for ( auto current = key.begin(); current != key.end(); ++ current ) {
        results = 127 * results + static_cast<unsigned char>( *current );
    }
    return results;

}

should do the trick, as long as you don't have to worry about porting to some exotic mainframes (which might not support uint32_t).

James Kanze
  • 150,581
  • 18
  • 184
  • 329
0

Use some of the well-known universal hash functions such as SHA instead, because those are supposed to guarantee that the same string will have the same hash everywhere. Note that in case you are doing something security-related, SHA might be too fast. It's a strange thing to say, but sometimes fast does not mean good as it opens a possibility for a brute force attack - in this case, there are other, slower hash function, some of which basically re-apply SHA many times in a row. Another thing, if you are hashing passwords, remember to salt them (I won't go into details, but the information is readily accessible online).

user2520968
  • 358
  • 1
  • 3
  • 11
  • 2
    Since he asked about `boost::hash`, I doubt that he was worried about cryptographic security. For hashing for data access, SHA is far to slow, and the hash it generates has enough bits that you'd need a big number package to do the modulo on it, to bring it down into range. – James Kanze Jul 02 '13 at 17:33
0

Hash-function above is simple, but weak and vulnerable.

For example, pass to that function string like "bb" "bbbb" "bbddbb" "ddffbb" -- any combination of pairs symbols with even ASCII codes, and watch for low byte. It always will be 57.

Rather, I recommend to use my hash function, which is relative lightweight, and does not have easy vulnerabilities:

#define NLF(h, c) (rand[(uint8_t)(c ^ h)])
uint32_t rand[0x100] = { 256 random non-equal values };

uint32_t oleg_h(const char *key) {
  uint32_t h = 0x1F351F35;
  char c;
  while(c = *key++)
    h = ((h >> 11) | (h << (32 - 11))) + NLF(h, c);
  h ^= h >> 16;
  return h ^ (h >> 8);
}
olegarch
  • 3,670
  • 1
  • 20
  • 19