115

I have read in few different places that using C++11's new string literals it might be possible to compute a string's hash at compile time. However, no one seems to be ready to come out and say that it will be possible or how it would be done.

  • Is this possible?
  • What would the operator look like?

I'm particularly interested use cases like this.

void foo( const std::string& value )
{
   switch( std::hash(value) )
   {
      case "one"_hash: one(); break;
      case "two"_hash: two(); break;
      /*many more cases*/
      default: other(); break;
   }
}

Note: the compile time hash function doesn't have to look exactly as I've written it. I did my best to guess what the final solution would look like, but meta_hash<"string"_meta>::value could also be a viable solution.

Xeo
  • 129,499
  • 52
  • 291
  • 397
deft_code
  • 57,255
  • 29
  • 141
  • 224
  • I can't seem to find anything either, but I could see having to force your hashing function into a constexpr. – luke Jan 21 '10 at 18:23
  • 4
    Is there a compiler that already supports user-defined literals? Gcc doesn't (http://gcc.gnu.org/projects/cxx0x.html) and i haven't found them being mentioned for VC10 either. Without compiler support it can only be guess work, but the templated user-defined literals *look* like it should be possible. – Georg Fritzsche Jan 21 '10 at 18:35
  • 1
    It's cute but not useful? If the switch value is a runtime string, you also need to check for collisions. Maybe packing is better (my answer has a pack function for stuffing 9 chars into 64 bits). – u0b34a0f6ae Nov 06 '11 at 10:57
  • @u0b34a0f6ae Why to check for collisions? – cubuspl42 Mar 24 '14 at 10:39
  • The compiler should issue an error if two case values are equal. – Mark Storer Apr 01 '19 at 18:07
  • @u0b34a0f6ae The compiler finds collisions for you. If you put two identical case expressions, the compiler throws an error. The process becomes: 1) switch on compiler-time hash 2) compare against string – Joshua Hyatt Nov 08 '20 at 20:06

12 Answers12

103

This is a little bit late, but I succeeded in implementing a compile-time CRC32 function with the use of constexpr. The problem with it is that at the time of writing, it only works with GCC and not MSVC nor Intel compiler.

Here is the code snippet:

// CRC32 Table (zlib polynomial)
static constexpr uint32_t crc_table[256] = {
    0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
    0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
    0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
...
};
template<size_t idx>
constexpr uint32_t crc32(const char * str)
{
    return (crc32<idx-1>(str) >> 8) ^ crc_table[(crc32<idx-1>(str) ^ str[idx]) & 0x000000FF];
}

// This is the stop-recursion function
template<>
constexpr uint32_t crc32<size_t(-1)>(const char * str)
{
    return 0xFFFFFFFF;
}

// This doesn't take into account the nul char
#define COMPILE_TIME_CRC32_STR(x) (crc32<sizeof(x) - 2>(x) ^ 0xFFFFFFFF)

enum TestEnum
{
    CrcVal01 = COMPILE_TIME_CRC32_STR("stack-overflow"),
};

CrcVal01 is equal to 0x335CC04A

Hope this will help you!

Clement JACOB
  • 1,087
  • 1
  • 9
  • 9
  • i guess it doesn't work with MSVC and the Intel compiler as [they are yet to implement](http://wiki.apache.org/stdcxx/C++0xCompilerSupport) support for `constexpr` – user1055604 Mar 23 '12 at 19:23
  • Would the runtime function counterpart generate the same hash? I want to compare the hardwired hash values with a table of strings loaded at runtime. – rraallvv Aug 18 '13 at 19:27
  • 5
    Yes absolutely. I tested it against the Python CRC32 runtime algorithm (coming from zlib) and the results are the same. In fact, what you're trying to achieve is exactly why I use this technique for! – Clement JACOB Aug 23 '13 at 07:07
  • 2
    Thanks for posting this, it's very useful! – Tamás Szelei Oct 19 '13 at 14:44
  • it's amazing. I think it's not a problem to MSC++ nowadays? – The Mask Mar 29 '14 at 21:48
  • Alas, its not work with clang, either http://goo.gl/kBG8HA Do you think clang does not support c++11 standart fully? – tower120 May 14 '14 at 14:38
  • 2
    You were missing a compile flag. Moreover I had to cast to size_t the value -1 in stop recursion template function. The updated version is available here (working from Clang 3.3) : http://goo.gl/vPMkfB – Clement JACOB May 23 '14 at 12:51
  • For MSVC, I don't really know what the status is for now, and don't have a Windows machine at hand. – Clement JACOB May 23 '14 at 12:55
  • 1
    `constexpr` is not available in VS2013, except in November 2013 CTP http://blogs.msdn.com/b/vcblog/archive/2013/11/18/announcing-the-visual-c-compiler-november-2013-ctp.aspx –  Jul 23 '14 at 18:12
  • Based on compile time string manipulation, I started to dev a small C++ data framework which allows one to manipulate data based on member name while still beeing type-safe: https://github.com/clems71/cppdataframework – Clement JACOB Jul 30 '15 at 09:36
  • 3
    You are recursing twice. This solution has an exponential complexity with respect to the string length, and the compiler is not clever enough to elide the second call. Check other answers for possible fix for this problem. – CygnusX1 Feb 27 '16 at 16:34
  • How do I get the rest of the table? – Michael Mar 03 '18 at 20:57
31

At least by my reading of §7.1.5/3 and §5.19, the following might be legitimate:

unsigned constexpr const_hash(char const *input) {
    return *input ?
        static_cast<unsigned int>(*input) + 33 * const_hash(input + 1) :
        5381;
}

This does seem to follow the basic rules in §7.1.5/3:

  1. The form is: "return expression;"
  2. Its only parameter is a pointer, which is a scalar type, and therefore a literal type.
  3. Its return is unsigned int, which is also scalar (and therefore literal).
  4. There is no implicit conversion to the return type.

There is some question whether the *inputs involve an illegal lvalue to rvalue conversion, and I'm not sure I understand the rules in §5.19/2/6/21 and §4.1 well enough to be sure about that.

From a practical viewpoint, this code is accepted by (for one example) g++, at least as far back as g++ 4.7.1.

Usage would be something like:

switch(std::hash(value)) {
    case const_hash("one"): one(); break;
    case const_hash("two"): two(); break;
    // ...
    default: other(); break;
}

To comply with the requirements of §5.19/2/6/2 you might have to do something like this though:

// one of the `constexpr`s is probably redundant, but I haven't figure out which.
char constexpr * constexpr v_one = "one"; 

// ....

case const_hash(v_one): one(); break;
  1. I'm using the extra 'slash' numbers to refer to unnumbered bullet points, so this is the second bullet point inside if the sixth bullet point under §5.19/2. I think I might have to talk to Pete Becker about whether it's possible to add some sort of numbers/letters/roman numerals down the hierarchy to identify pieces like this...
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 3
    Two things wrong with this. 1: Recursive calls are not allowed in `constexpr`, 2: You have no halting condition (where `*input == nullptr`) and as I understand `constexpr` you can't have one. – Motti Jan 21 '10 at 22:11
  • 10
    Where does it say recursion isn't allowed in a constexpr? As far as I can see, it only says that any functions you call must themselves be marked constexprt (§5.19/2/2). I did make a mistake in the termination condition, which I've now fixed (I accidentally used || where it should have been &&). – Jerry Coffin Jan 21 '10 at 22:17
  • I don't remember where it says that but I remember reading it somewhere (downvote cancelled) – Motti Jan 21 '10 at 22:19
  • 3
    recursive constexpr. I read some meeting minutes from 2008. There was discussion about allowing or disallowing recursive constexpr functions. The gist was the current wording didn't forbid it, and slightly implied it. The committee felt that such a powerful feature should be explicitly spelt out. That was back in 2008, I don't know what's happened since then. – deft_code Jan 22 '10 at 01:49
  • I don't see it in the draft, but N2235 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2235.pdf) says in 4.1: "possibly involving calls of *previously defined* constant expression functions". Either that restriction was dropped or i am overlooking it in the draft. – Georg Fritzsche Jan 22 '10 at 01:49
  • 3
    Right -- I probably should have pointed out that I was going from N3000, which (I believe) is currently the most recent draft. I'm pretty sure it was forbidden at one time, but at least for now it seems to be allowed. – Jerry Coffin Jan 22 '10 at 05:17
  • I wish the other answer was correct, but it looks like this is the closest to correct. The only remaining question is if a pointer can be dereference at in a constexpr. – deft_code Jan 28 '10 at 14:46
  • 2
    Um, the && operator is returning a bool, so this function probably isn't doing what you want. In particular it returns 0 for any empty string and possibly certain other strings beginning with a char which converts to `(unsigned)-1` if there is any; and returns 1 for all other strings. Rewrite with ternary conditional operator? – ndkrempel May 20 '12 at 17:12
  • 1
    Just a comment. You can easily add a `unsigned constexpr operator "" _hash(const char*,size_t)` that calls your `const_hash`. This lets you do `"one"_hash` for your case values. I'd also use `const_hash(value)` rather than `std::hash(value)` as you can't guarantee the implementation of `std::hash` won't be different. https://ideone.com/dPpmg2 for example – KitsuneYMG Feb 20 '14 at 16:00
  • @KitsuneYMG I think that you can't call a `constexpr` function with `value` as an argument, because `value` is known only at runtime. – cubuspl42 Mar 24 '14 at 10:36
  • 1
    @cubuspl42 the value being switched on is *always* a runtime value. The case values *must* be compile-time values. If you check out the ideone link I posted, you'll see it works. – KitsuneYMG Mar 24 '14 at 13:51
  • 1
    Is this really computed at run-time or am I at risk of it be computed at run-time? – The Mask Mar 31 '14 at 22:32
  • 1
    Unfortunately, it has chances not to be compile time computed, unless you force the expression into a constexpr value like a integer templates parameter for instance. Its up to compiler to choose the constexpr or not-constexpr versions when both are available in a given context. Using the suggested meta_hash<"yeah"_has>::value should work. – Jean-Bernard Jansen May 13 '14 at 01:47
17

This snippet based on Clement JACOB's one. But works with clang too. And it should be faster on compilation (it have only one recursive call, not two like in original post).

#include <iostream>
#include <string>
#include <vector>

static constexpr unsigned int crc_table[256] = {
    0x00000000, 0x77073096, 0xee0e612c, 0x990951ba, 0x076dc419, 0x706af48f,
    0xe963a535, 0x9e6495a3,    0x0edb8832, 0x79dcb8a4, 0xe0d5e91e, 0x97d2d988,
    0x09b64c2b, 0x7eb17cbd, 0xe7b82d07, 0x90bf1d91, 0x1db71064, 0x6ab020f2,
    0xf3b97148, 0x84be41de, 0x1adad47d, 0x6ddde4eb, 0xf4d4b551, 0x83d385c7,
    0x136c9856, 0x646ba8c0, 0xfd62f97a, 0x8a65c9ec, 0x14015c4f, 0x63066cd9,
    0xfa0f3d63, 0x8d080df5, 0x3b6e20c8, 0x4c69105e, 0xd56041e4, 0xa2677172,
    0x3c03e4d1, 0x4b04d447, 0xd20d85fd, 0xa50ab56b, 0x35b5a8fa, 0x42b2986c,
    0xdbbbc9d6, 0xacbcf940, 0x32d86ce3, 0x45df5c75, 0xdcd60dcf, 0xabd13d59,
    0x26d930ac, 0x51de003a, 0xc8d75180, 0xbfd06116, 0x21b4f4b5, 0x56b3c423,
    0xcfba9599, 0xb8bda50f, 0x2802b89e, 0x5f058808, 0xc60cd9b2, 0xb10be924,
    0x2f6f7c87, 0x58684c11, 0xc1611dab, 0xb6662d3d, 0x76dc4190, 0x01db7106,
    0x98d220bc, 0xefd5102a, 0x71b18589, 0x06b6b51f, 0x9fbfe4a5, 0xe8b8d433,
    0x7807c9a2, 0x0f00f934, 0x9609a88e, 0xe10e9818, 0x7f6a0dbb, 0x086d3d2d,
    0x91646c97, 0xe6635c01, 0x6b6b51f4, 0x1c6c6162, 0x856530d8, 0xf262004e,
    0x6c0695ed, 0x1b01a57b, 0x8208f4c1, 0xf50fc457, 0x65b0d9c6, 0x12b7e950,
    0x8bbeb8ea, 0xfcb9887c, 0x62dd1ddf, 0x15da2d49, 0x8cd37cf3, 0xfbd44c65,
    0x4db26158, 0x3ab551ce, 0xa3bc0074, 0xd4bb30e2, 0x4adfa541, 0x3dd895d7,
    0xa4d1c46d, 0xd3d6f4fb, 0x4369e96a, 0x346ed9fc, 0xad678846, 0xda60b8d0,
    0x44042d73, 0x33031de5, 0xaa0a4c5f, 0xdd0d7cc9, 0x5005713c, 0x270241aa,
    0xbe0b1010, 0xc90c2086, 0x5768b525, 0x206f85b3, 0xb966d409, 0xce61e49f,
    0x5edef90e, 0x29d9c998, 0xb0d09822, 0xc7d7a8b4, 0x59b33d17, 0x2eb40d81,
    0xb7bd5c3b, 0xc0ba6cad, 0xedb88320, 0x9abfb3b6, 0x03b6e20c, 0x74b1d29a,
    0xead54739, 0x9dd277af, 0x04db2615, 0x73dc1683, 0xe3630b12, 0x94643b84,
    0x0d6d6a3e, 0x7a6a5aa8, 0xe40ecf0b, 0x9309ff9d, 0x0a00ae27, 0x7d079eb1,
    0xf00f9344, 0x8708a3d2, 0x1e01f268, 0x6906c2fe, 0xf762575d, 0x806567cb,
    0x196c3671, 0x6e6b06e7, 0xfed41b76, 0x89d32be0, 0x10da7a5a, 0x67dd4acc,
    0xf9b9df6f, 0x8ebeeff9, 0x17b7be43, 0x60b08ed5, 0xd6d6a3e8, 0xa1d1937e,
    0x38d8c2c4, 0x4fdff252, 0xd1bb67f1, 0xa6bc5767, 0x3fb506dd, 0x48b2364b,
    0xd80d2bda, 0xaf0a1b4c, 0x36034af6, 0x41047a60, 0xdf60efc3, 0xa867df55,
    0x316e8eef, 0x4669be79, 0xcb61b38c, 0xbc66831a, 0x256fd2a0, 0x5268e236,
    0xcc0c7795, 0xbb0b4703, 0x220216b9, 0x5505262f, 0xc5ba3bbe, 0xb2bd0b28,
    0x2bb45a92, 0x5cb36a04, 0xc2d7ffa7, 0xb5d0cf31, 0x2cd99e8b, 0x5bdeae1d,
    0x9b64c2b0, 0xec63f226, 0x756aa39c, 0x026d930a, 0x9c0906a9, 0xeb0e363f,
    0x72076785, 0x05005713, 0x95bf4a82, 0xe2b87a14, 0x7bb12bae, 0x0cb61b38,
    0x92d28e9b, 0xe5d5be0d, 0x7cdcefb7, 0x0bdbdf21, 0x86d3d2d4, 0xf1d4e242,
    0x68ddb3f8, 0x1fda836e, 0x81be16cd, 0xf6b9265b, 0x6fb077e1, 0x18b74777,
    0x88085ae6, 0xff0f6a70, 0x66063bca, 0x11010b5c, 0x8f659eff, 0xf862ae69,
    0x616bffd3, 0x166ccf45, 0xa00ae278, 0xd70dd2ee, 0x4e048354, 0x3903b3c2,
    0xa7672661, 0xd06016f7, 0x4969474d, 0x3e6e77db, 0xaed16a4a, 0xd9d65adc,
    0x40df0b66, 0x37d83bf0, 0xa9bcae53, 0xdebb9ec5, 0x47b2cf7f, 0x30b5ffe9,
    0xbdbdf21c, 0xcabac28a, 0x53b39330, 0x24b4a3a6, 0xbad03605, 0xcdd70693,
    0x54de5729, 0x23d967bf, 0xb3667a2e, 0xc4614ab8, 0x5d681b02, 0x2a6f2b94,
    0xb40bbe37, 0xc30c8ea1, 0x5a05df1b, 0x2d02ef8d
};


template<int size, int idx = 0, class dummy = void>
struct MM{
  static constexpr unsigned int crc32(const char * str, unsigned int prev_crc = 0xFFFFFFFF)
  {
      return MM<size, idx+1>::crc32(str, (prev_crc >> 8) ^ crc_table[(prev_crc ^ str[idx]) & 0xFF] );
  }
};

// This is the stop-recursion function
template<int size, class dummy>
struct MM<size, size, dummy>{
  static constexpr unsigned int crc32(const char * str, unsigned int prev_crc = 0xFFFFFFFF)
  {
      return prev_crc^ 0xFFFFFFFF;
  }
};

// This don't take into account the nul char
#define COMPILE_TIME_CRC32_STR(x) (MM<sizeof(x)-1>::crc32(x))


template<unsigned int crc>
void PrintCrc()
{
    std::cout << crc << std::endl;
}


int main()
{

    PrintCrc<COMPILE_TIME_CRC32_STR("HAH")>();
}

See proof of concept here

wawiesel
  • 170
  • 9
tower120
  • 5,007
  • 6
  • 40
  • 88
  • 1
    Thank you, Jacob's answer works fine for what I wanted on GCC but msvc was throwing errors with larger strings. Your answer works on msvc with the larger strings that I need to hash. – Daniel Moodie Jan 05 '17 at 23:40
14

This is an attempt to solve the OP's problem as exactly as possible.

namespace my_hash {
  template<class>struct hasher;
  template<>
  struct hasher<std::string> {
    std::size_t constexpr operator()(char const *input)const {
      return *input ?
        static_cast<unsigned int>(*input) + 33 * (*this)(input + 1) :
        5381;
    }
    std::size_t operator()( const std::string& str ) const {
      return (*this)(str.c_str());
    }
  };
  template<typename T>
  std::size_t constexpr hash(T&& t) {
    return hasher< typename std::decay<T>::type >()(std::forward<T>(t));
  }
  inline namespace literals {
    std::size_t constexpr operator "" _hash(const char* s,size_t) {
      return hasher<std::string>()(s);
    }
  }
}
using namespace my_hash::literals;
void one() {} void two() {} void other() {}

void foo( const std::string& value )
{
  switch( my_hash::hash(value) )
  {
    case "one"_hash: one(); break;
    case "two"_hash: two(); break;
    /*many more cases*/
    default: other(); break;
  }
}

live example.

Note the main difference -- std::hash cannot be used, as we do not have control over std::hash's algorithm, and we must reimplement it as a constexpr in order to evaluate it at compile time. In addition, there are no "transparent" hashes in std, so you cannot (without creating a std::string) hash a raw character buffer as a std::string.

I stuck the std::string custom hasher (with transparent const char* support) into a my_hash namespace, so you can store it in a std::unordered_map if you need consistency.

Based off of @JerryCoffin's excellent answer and the comment thread below it, but with an attempt to write it with current C++11 best practices (as opposed to anticipating them!).

Note that using a "raw hash" for a switch statement case is dangerous. You'll want to do an == comparison afterwards to confirm it worked.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
12

Another solution based on Clement JACOB's one, using C++11 constexpr (not the extended C++14) but having only one recursion.

namespace detail {
// CRC32 Table (zlib polynomial)
static constexpr uint32_t crc_table[256] = { 0x00000000L, 0x77073096L, ... }

template<size_t idx>
constexpr uint32_t combine_crc32(const char * str, uint32_t part) {
  return (part >> 8) ^ crc_table[(part ^ str[idx]) & 0x000000FF];
}

template<size_t idx>
constexpr uint32_t crc32(const char * str) {
  return combine_crc32<idx>(str, crc32<idx - 1>(str));
}

// This is the stop-recursion function
template<>
constexpr uint32_t crc32<size_t(-1)>(const char * str) {
  return 0xFFFFFFFF;
}

} //namespace detail

template <size_t len>
constexpr uint32_t ctcrc32(const char (&str)[len]) {
  return detail::crc32<len - 2>(str) ^ 0xFFFFFFFF;
}

Some explanation

  • We are using a single recursion, so that the function works well even for longer strings.
  • The extra function combine_crc32 allows us to store the result of a recursion under a variable part and using it twice. This function is a walkaround for C++11 limitaton disallowing local variable declarations.
  • The ctcrc32 function expects a string literal, which is passed as a const char (&)[len]. This way we can get the string length as a template parameter and do not have to rely on macros.
CygnusX1
  • 20,968
  • 5
  • 65
  • 109
12

If you have a c++17 compiler and string_view, this becomes trivial, just write the normal version:

constexpr uint32_t crc32(std::string_view str)
{
    uint32_t crc = 0xffffffff;
    for (auto c : str)
        crc = (crc >> 8) ^ crc_table[(crc ^ c) & 0xff];
    return crc ^ 0xffffffff;
}
Guillaume Gris
  • 2,135
  • 17
  • 34
  • Note that the compiler may decide not to process this at compile time if you simply write `crc32("mystring")` (typically VS tends to do that). The trick to circumvent that issue is to create a constexpr variable that depends on the compile time evaluation of your crc32. Typically `constexpr uint32_t val = crc32("mystring");` – Guillaume Gris May 05 '20 at 07:19
8

Note that the form shown here wasn't accepted into the standard, as noted below.

Compile time string processing is guessed to become possible through user-defined literals proposed in N2765.
As i already mentioned, i don't know of any compiler that currently implements it and without compiler support there can be only guess work.

In §2.13.7.3 and 4 of the draft we have the following:

Otherwise (S contains a literal operator template), L is treated as a call of the form
operator "" X<'c1', 'c2', ... , 'ck'>() where n is the source character sequence c1c2...ck. [Note: The sequence c1c2...ck can only contain characters from the basic source character set. —end note]

Combine that with constexpr and we should have compile time string processing.

update: i overlooked that i was reading the wrong paragraph, this form is allowed for user-defined-integer-literals and -floating-literals, but apparently not for -string-literals (§2.13.7.5).
This part of the proposal seems to have not been accepted.

That being said, with my limited glimpse at C++0x, it might look something like this (i most likely got something wrong):

template<char c, char... str>
struct hash {
    static const unsigned result = c + hash<str...>::result;
};

template<char c>
struct hash {
    static const unsigned result = c;
};

template<char... str> 
constexpr unsigned
operator "" _hash() {
    return hash<str>::result;
}

// update: probably wrong, because the above 
// form is not allowed for string-literals:    
const unsigned h = "abcd"_hash;

If Jerrys approach works, then the following should work however:

constexpr unsigned operator "" _hash(const char* s, size_t) {
    return const_hash(s);
}
Community
  • 1
  • 1
Georg Fritzsche
  • 97,545
  • 26
  • 194
  • 236
  • Nice (and necessary) combination of var length templates and `constexpr` user defined literal. I'm not sure you can use a string literal as a template parameter, don't they have static linkage? (they do in C++98 at least and are therefore verboten as template parameters). – Motti Jan 21 '10 at 22:16
  • I have mixed up the paragraphs and thought that this case was an exception - sadly it doesn't appear to be so. – Georg Fritzsche Jan 22 '10 at 01:38
  • 1
    @Motti: where is gf using the string literal as a template parameter? Are you confusing passing the variadic template of chars as a string literal? – deft_code Jan 22 '10 at 02:00
  • It seems from the original proposal, the template version for string literals wasn't accepted (due to concatenation issues? http://stackoverflow.com/questions/1108008/any-ideas-for-c1y/1109243#1109243 - comments) - too bad. – Georg Fritzsche Jan 22 '10 at 03:09
  • 1
    The last version of `operator ""_hash` works for me in Xcode 5.0.2. – cubuspl42 Mar 24 '14 at 10:27
6

Here is another C++11 implementation (based on CygnusX1's answer), which works with both constexpr char arrays and runtime strings:

namespace detail {

    // CRC32 Table (zlib polynomial)
    static constexpr uint32_t crc_table[256] = { 0x00000000L, 0x77073096L, ... };

    constexpr uint32_t combine_crc32(size_t idx, const char * str, uint32_t part) {
        return (part >> 8) ^ crc_table[(part ^ str[idx]) & 0x000000FF];
    }

    constexpr uint32_t crc32(size_t idx, const char * str) {
        return idx == size_t(-1) ? 
            0xFFFFFFFF : combine_crc32(idx, str, crc32(idx - 1, str));
    }
}

uint32_t ctcrc32(std::string const& str) {
    size_t len = str.size() + 1;
    return detail::crc32(len - 2, str.c_str()) ^ 0xFFFFFFFF;
}

template <size_t N>
constexpr uint32_t ctcrc32(const char (&str)[N]) {
    return detail::crc32(N - 2, str) ^ 0xFFFFFFFF;
}

You need str.size() + 1 in the first overload because N in the second overload is the size of the array, including the null-terminating character (which std::string::size does not account for).

I did not add an overload for const char * because it messes up with the second overload — You can easily add overloads for const char *, size_t or std::string_view.

Holt
  • 36,600
  • 7
  • 92
  • 139
6

The following works in GCC 4.6.1, and you can use either hash or pack in switch blocks.

/* Fast simple string hash (Bernstein?) */                                       
constexpr unsigned int hash(const char *s, int off = 0) {                        
    return !s[off] ? 5381 : (hash(s, off+1)*33) ^ s[off];                           
}                                                                                

/* Pack the string into an unsigned int                                          
 * Using 7 bits (ascii) it packs 9 chars into a uint64_t                         
 */                                                                              
template <class T = uint64_t, unsigned int Bits = 7>                             
constexpr T pack(const char *s, unsigned int off = 0) {                          
    return (Bits*off >= CHAR_BIT*sizeof(T) || !s[off]) ? 0 :                     
        (((T)s[off] << (Bits*off)) | pack(s,off+1));                             
}  

GCC seemingly(?) does not allow recursive calls where we pass on s+1 with s a pointer, which is why I use the off variable.

u0b34a0f6ae
  • 48,117
  • 14
  • 92
  • 101
5

This is a nice question.

Based on Jerry Coffin's answer, I've created another one that's compatible with Visual Studio 2017's std::hash.

#include <functional>
#include <cassert>
using namespace std;


constexpr size_t cx_hash(const char* input) {
    size_t hash = sizeof(size_t) == 8 ? 0xcbf29ce484222325 : 0x811c9dc5;
    const size_t prime = sizeof(size_t) == 8 ? 0x00000100000001b3 : 0x01000193;

    while (*input) {
        hash ^= static_cast<size_t>(*input);
        hash *= prime;
        ++input;
    }

    return hash;
}


int main() {
    /* Enter your code here. Read input from STDIN. Print output to STDOUT */

    auto a = cx_hash("test");
    hash<string> func;
    auto b = func("test");
    assert(a == b);

    return 0;
}

https://github.com/manuelgustavo/cx_hash

manuel saraiva
  • 601
  • 6
  • 5
2

I was still missing a crc32-literal variant (which is not possible with templates), so here is my suggestion based on CygnusX1. Did some testing, feel free to give feedback.

Testet on MSVC.

PS: I hate searching for additional stuff somewhere else, so i copied the CRC table on the bottom of my answer.

#include <inttypes.h>

namespace detail
{
    // CRC32 Table (zlib polynomial)
    static constexpr uint32_t crc_table[256] =
    {
        0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
        ...
    };

    constexpr uint32_t combine_crc32( const char c, uint32_t part )
    {
        return (part >> 8) ^ crc_table[(part ^ c) & 0x000000FF];
    }

    constexpr uint32_t crc32( const char * str, size_t idx )
    {
        return combine_crc32( str[idx], idx ? crc32( str, idx - 1 ) : 0xFFFFFFFF );
    }
} //namespace detail

constexpr uint32_t ctcrc32( const char* str, size_t len )
{
    return detail::crc32( str, len ) ^ 0xFFFFFFFF;
}

size_t constexpr operator "" _hash( const char* str, size_t len )
{
    return ctcrc32( str, len );
}

Alternative with algorithmn from Dan Bernstein (djb2) (combined answers from Jerry Coffin + Georg Fritzsche)

unsigned constexpr const_hash( char const *input )
{
    return *input ?
        static_cast<unsigned int>(*input) + 33 * const_hash( input + 1 ) :
        5381;
}
size_t constexpr operator "" _hash( const char* str, size_t len )
{
    return const_hash( str );
}

Crc32 table:

static constexpr uint32_t crc_table[256] =
    {
        0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L,
        0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L,
        0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L,
        0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL,
        0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L,
        0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L,
        0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L,
        0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL,
        0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L,
        0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL,
        0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L,
        0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L,
        0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L,
        0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL,
        0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL,
        0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L,
        0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL,
        0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L,
        0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L,
        0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L,
        0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL,
        0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L,
        0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L,
        0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL,
        0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L,
        0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L,
        0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L,
        0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L,
        0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L,
        0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL,
        0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL,
        0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L,
        0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L,
        0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL,
        0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL,
        0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L,
        0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL,
        0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L,
        0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL,
        0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L,
        0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL,
        0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L,
        0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L,
        0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL,
        0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L,
        0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L,
        0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L,
        0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L,
        0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L,
        0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L,
        0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
        0x2d02ef8dL
    };
Zacharias
  • 576
  • 3
  • 6
0

Based on Jerry Coffin's approach and Georg Fritzsche's approach

I used the following code without constexpr auto const_tmp = NGX_EM_HASH("authorization");:

template <size_t N> constexpr size_t string_literal_length(const char(&str)[N]) { return N - 1; }

// https://stackoverflow.com/questions/2111667/compile-time-string-hashing/66690839#66690839
// "cookie"_hash = ngx_hash(ngx_hash(ngx_hash(ngx_hash(ngx_hash('c', 'o'), 'o'), 'k'), 'i'), 'e');
// See also `ngx_uint_t ngx_hash_key(u_char *data, size_t len)` in nginx\src\core\ngx_hash.c
#if 0
template<ngx_uint_t sum, char ch, char... str> struct ngx_em_hasher {
    static const ngx_uint_t result = ngx_em_hasher<ngx_hash(sum, u_char(ch)), str...>::result;
};
template<ngx_uint_t sum, char ch> struct ngx_em_hasher {
    static const ngx_uint_t result = ngx_hash(sum, u_char(ch));
};
template<char... str> constexpr
ngx_uint_t operator "" _hash() { return ngx_em_hasher<0, str>::result; }

// update: probably wrong, because the above form is not allowed for string-literals:    
// const unsigned h = "abcd"_hash;
#elif defined(_MSC_VER2)
// reducer function: the previous calculation result must be passed to the next iteration
static constexpr ngx_uint_t ngx_em_hash(const char* const psz, ngx_uint_t sum = 0) {
    return *psz ? ngx_em_hash(psz + 1, ngx_hash(sum, u_char(*psz))) : sum;
}
constexpr ngx_uint_t operator "" _hash(const char* s, size_t) { return ngx_em_hash(s); }
// #define NGX_EM_HASH(str) ngx_em_hash(str)

#define NGX_EM_X(x) x
// constexpr auto const_hash = NGX_EM_HASH("authorization");
// hdr->hash = const_hash;
#define NGX_EM_HASH(string_literals) ngx_em_const<NGX_EM_X(string_literals)_hash>::value

#else
template<size_t idx> constexpr ngx_uint_t ngx_em_hash(const char* const psz, ngx_uint_t sum = 0) {
    return ngx_em_hash<idx - 1>(psz + 1, ngx_hash(sum, u_char(*psz)));
}
// This is the stop-recursion function
template<> constexpr ngx_uint_t ngx_em_hash<0>(const char* const psz, ngx_uint_t sum) {
    return sum;
}
// This doesn't take into account the nul char.
#define COMPILE_TIME_NGX_HASH(x) ngx_em_hash<sizeof(x) - 1>(x)
// Regardless of what Optimize Options of the compiler?
// https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
// https://learn.microsoft.com/en-us/cpp/build/reference/o-options-optimize-code?view=msvc-160
#define NGX_EM_HASH(x) ngx_em_const<ngx_em_hash<sizeof(x) - 1>(x)>::value
#endif

void snippet(ngx_table_elt_t *hdr) {
    ngx_str_set(&hdr->key, "Authorization");
    hdr->lowcase_key = (u_char *) "authorization";
    //constexpr auto const_tmp = NGX_EM_HASH("authorization");
    //hdr->hash = const_tmp;
    hdr->hash = NGX_EM_HASH("authorization");
    sr->headers_in.authorization = hdr;
}

And then its disassembly result looked like this (using VS2017 v15.9.27):

                ;hdr->hash = NGX_EM_HASH("authorization");
00007FFD36B8B7DE  mov         rax,qword ptr [rbp+4D8h]  
00007FFD36B8B7E5  mov         rcx,4EEC63AFAD69E079h     ; Decimal=5687030035641917561 __int64
00007FFD36B8B7EF  mov         qword ptr [rax],rcx 

But, if using #define NGX_EM_HASH(string_literals) NGX_EM_X(string_literals)_hash, its disassembly result looked like this:

                ;hdr->hash = NGX_EM_HASH("authorization");
00007FFD337FFE93  lea         rcx,[string "authorization" (07FFD33885ED0h)]  
00007FFD337FFE9A  call        operator "" _hash (07FFD336B78ECh)  
00007FFD337FFE9F  mov         rcx,qword ptr [rbp+4D8h]  
00007FFD337FFEA6  mov         qword ptr [rcx],rax  

Visual Studio 2017 - Disassembly of ngx_hash in Debug mode

Online

  • Compiler Explorer with disassembly outputs of VC++ and GCC (Run compilers interactively from your web browser and interact with the assembly)
  • ngx_hash@OnlineGDB beta (online compiler and debugger for c/c++)
samm
  • 620
  • 10
  • 22