1

I have a C string (wchar_t const*) whose lifetime is owned by some other data structure; references to the string are passed around by pointer. I want to put such instances into an unordered_map. Is there a standard tool I can use to get the hash of this without constructing a temporary std::wstring and calling std::hash<std::wstring>?

Note that std::hash<T*> returns the hash of the pointer, not the hash of the contents of a byte stream pointed to by that pointer.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 2
    If your library supports it you can use `std::experimental::string_view` from the library fundamentals TS which also adds a hash for `string_view`s. – user657267 Jul 24 '14 at 00:38
  • "Is there...?" No. You can use `std::hash` on each character combined with `boost::hash_combine` or similar (e.g. [here](http://stackoverflow.com/questions/6899392/generic-hash-function-for-all-stl-containers)), as I'm sure you know... ;-). There are a few hashing related proposals for C++14 that would help, meanwhile it's crazy not having a `void*`,`size_t` variant, but that's the size of it. – Tony Delroy Jul 24 '14 at 06:51

1 Answers1

0

As you noted, and as is explained here, there is no std::hash specialization for C-style strings. Quoted from the linked page:

There is no specialization for C strings. std::hash<const char *> produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

Thus, the hash value produced by std::hash when applied on any such string is not related to its actual contents, and is thus not suitable for the purposes you need.

What can you do? Constructing a temporary is out of the game, because it could involve an allocation, which would introduce exception unsafety, and would always be an useless copy. As user657267 pointed out in the comment above, if your standard library supports basic_string_view, it should also provide the corresponding std::hash specializations, listed in this page.

Finally, you could roll your own hashing algorithm. If the hash values will be used in unordered containers, the quality of the algorithm will impact on performance, but not on the uniqueness of the keys (i.e. there will not be any collisions; you can test it), as I discovered earlier. This example implements the X65599 algorithm, which has worked for me:

#include <cstring>

struct
    hasher final
{
    constexpr std::size_t
        operator()
        ( const char * const s )
        const noexcept
        {
            std::size_t h = 0;

            for ( std::size_t i = 0 , l = std::strlen(s) ; i < l ; ++i )
            {
                h += h * 65599 + s[i];
            }

            return h ^ (h >> 16);
        }
};

If your compiler does not support C++14, you may remove the constexpr specifier. It wouldn't be useful anyway, if the data is stored somewhere else.

EDIT: I have just realized that the example algorithm I presented works on narrow strings. I guess you can still search for one that operates on wide characters.

Community
  • 1
  • 1
djsp
  • 2,174
  • 2
  • 19
  • 40