9

I was looking for md5 for C++, and I realize md5 is not built in (even though there are a lot of very good libraries to support the md5 function). Then, I realized I don't actually need md5, any hashing method will do. Thus, I was wondering if C++ has such functions? I mean, built-in hashing functions?

While I was researching for C++, I saw Java, PHP, and some other programming languages support md5. For example, in PHP, you just need to call: md5("your string");.

A simple hash function will do. (If possible, please include some simple code on how to use it.)

Null
  • 1,950
  • 9
  • 30
  • 33
generator
  • 93
  • 2
  • 5
  • How about crc32? http://stackoverflow.com/questions/302914/crc32-c-or-c-implementation – Rookie Jun 01 '11 at 11:51
  • Asking for a "hash method" is pretty vague. Do you need a cryptographic hash? Or are you just trying to index a collection? Do you need resistance to algorithmic complexity attacks? Etcetera. – David Schwartz Jul 14 '15 at 18:12

3 Answers3

11

This is simple. With C++11 you get a

hash<string>

functor which you can use like this (untested, but gives you the idea):

hash<string> h;
const size_t value = h("mystring");

If you don't have C++11, take a look at boost, maybe boost::tr1::hash_map. They probably provide a string-hashing function, too.

For very simple cases you can start with something along these lines:

size_t h = 0
for(int i=0; i<s.size(); ++i)
    h = h*31 + s[i];
return h;

To take up the comment below. To prevent short strings from clustering you may want to initialize h differently. Maybe you can use the length for that (but that is just my first idea, unproven):

size_t h = numeric_limits::max<size_t>() / (s.length()+1); // +1: no div-by-0
...

This should not be worse then before, but still far from perfect.

towi
  • 21,587
  • 28
  • 106
  • 187
  • 5
    Note that this hash function tends to cluster when there are a large number of very short strings. (Not a frequent occurrence, and it's fairly good in most cases.) – James Kanze Jun 01 '11 at 11:54
  • @JamesKanze what do you think of my idea about the initialization if h? – towi Jan 19 '14 at 11:17
  • It's not the initialization that causes the clustering, but the small multiplier. I usually use `127`, but an even larger prime might be in order today. (Obviously, you shouldn't use `0` for initialization if you can have `'\0'` bytes in the string; strings of all `'\0'` bytes will hash to the same value, regardless of length. But again, this isn't usually an issue.) – James Kanze Jan 20 '14 at 09:09
8

It depends which version of C++ you have... and what kind of hashing function you are looking for.

C++03 does not have any hashing container, and thus no need for hashing. A number of compilers have been proposing custom headers though. Otherwise Boost.Functional.Hash may help.

C++0x has the unordered_ family of containers, and thus a std::hash predicate, which already works for C++ standard types (built-in types and std::string, at least).

However, this is a simple hash, good enough for hash maps, not for security.

If you are looking for cryptographic hash, then the issue is completely different (and md5 is loosy), and you'll need a library for (for example) a SHA-2 hash.

If you are looking for speed, check out CityHash and MurmurHash. Both have restrictions, but they are heavily optimized.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
4

How about using boost, Boost.Functional/Hash

Alok Save
  • 202,538
  • 53
  • 430
  • 533