22

I am trying to get a string-based switch expression to work in C using a hash function. I've been able to get it to work with clean syntax using 'constexpr' with Clang/LLVM turned to C++, even though the code is C.

However, there are of course odd side effects of having it compile as C++, like lack of void* implicit casting which becomes really awkward.

So the question is how to solve this dilemma (without slapping the C11 committee upside their head for why this wasn't added to the C spec)

  1. Is there a way to get constexpr option turned on with C?
  2. Is there a way to get implicit void* casting turned on with C++?
  3. Is there another clean way to code this in C11/C99 that doesn't require recalculating hashes?

Here is my current example code:

constexpr uint64 cHash(char const* text, uint64 last_value = basis)
{
    return *str ? cHash(text+1, (*text ^ last_value) * prime) : last_value;
}

void SwitchFunction(char const* text)
{
    switch(Hash(text))
    {
        case cHash("first"):
            break;
        case cHash("second"):
            break;
        case cHash("third"):
            break;
        default:
            break;
    }
}
Troy Harvey
  • 1,365
  • 2
  • 13
  • 24
  • 1
    While clever, I warn that you'll probably run into a case later on where the hashes collide. Then, I don't think there will be an easy way out. – nneonneo Apr 09 '14 at 08:10
  • 1
    I don't really see why the lack of type unsafety in C++ is a big deal. – PlasmaHH Apr 09 '14 at 08:12
  • 5
    Adding 'constexpr' support to C has no apparent downsides, it is just a preprocessor directive. However compiling C with C++, has a number of quirks that could break existing code. In addition, void* are used all the time in C (to get OOP-like behavior). Code becomes less readable with casting thrown about. There is also an argument that explicit casting increases unsaftey by overriding potential compiler warnings. – Troy Harvey Apr 11 '14 at 05:50
  • @TroyHarvey hi, I was working extensively on this topic recently. I did a thread about it, didn't do it good at start so the thread is closed, but I updated it with the working code, I really would like you to check it, and comment your opinion about it. https://stackoverflow.com/questions/66833211/what-is-the-proper-way-to-reference-and-dereference-a-pointer-to-void-with-multi – R1S8K May 19 '21 at 12:00
  • The closest thing to automatically (but not implicitly) casting `void *` in C++, which I use in some wrappers (macros) to `malloc()`, is `static_cast`, where `ptr` is the left-hand-side of the `=`. I found some code very similar to mine in this answer: . – alx - recommends codidact Jun 25 '21 at 10:46

5 Answers5

13

I'm a bit late to the party, but was recently facing the same problem.

For such a simple hash function, you can just implement it using the C preprocessor. The downside is that the preprocessor can not split strings into characters, so instead of hash("first") you will have to write HASH('f','i','r','s','t'). The HASH macro is implemented using __VA_ARGS__ and works for strings with up to eight characters.

I have also turned you hash function from a recursive one into an iterative one, which is a bit easier to read and doesn't require the optional argument. The generated assembly is pretty much the same (https://godbolt.org/z/1g8LPI).

#include <stdio.h>

typedef unsigned long uint64;

#define HASH_BASIS 17UL
#define HASH_PRIME 11UL

#define HASH_1(ARG1) ((ARG1 ^ HASH_BASIS) * HASH_PRIME)
#define HASH_2(ARG1, ARG2) ((ARG2 ^ HASH_1(ARG1)) * HASH_PRIME)
#define HASH_3(ARG1, ARG2, ARG3) ((ARG3 ^ HASH_2(ARG1, ARG2)) * HASH_PRIME)
#define HASH_4(ARG1, ARG2, ARG3, ARG4)                                         \
    ((ARG4 ^ HASH_3(ARG1, ARG2, ARG3)) * HASH_PRIME)
#define HASH_5(ARG1, ARG2, ARG3, ARG4, ARG5)                                   \
    ((ARG5 ^ HASH_4(ARG1, ARG2, ARG3, ARG4)) * HASH_PRIME)
#define HASH_6(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6)                             \
    ((ARG6 ^ HASH_5(ARG1, ARG2, ARG3, ARG4, ARG5)) * HASH_PRIME)
#define HASH_7(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7)                       \
    ((ARG7 ^ HASH_6(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6)) * HASH_PRIME)
#define HASH_8(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7, ARG8)                 \
    ((ARG8 ^ HASH_7(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7)) * HASH_PRIME)

#define HASH_COUNT(ARG1, ARG2, ARG3, ARG4, ARG5, ARG6, ARG7, ARG8, func, ...)  \
    func

#define HASH(...)                                                              \
    HASH_COUNT(__VA_ARGS__, HASH_8(__VA_ARGS__), HASH_7(__VA_ARGS__),          \
               HASH_6(__VA_ARGS__), HASH_5(__VA_ARGS__), HASH_4(__VA_ARGS__),  \
               HASH_3(__VA_ARGS__), HASH_2(__VA_ARGS__), HASH_1(__VA_ARGS__))

uint64 hash(const char *text) {
    uint64 h = HASH_BASIS;
    char c;
    while ((c = *text++) != '\0') {
        h = (c ^ h) * HASH_PRIME;
    }
    return h;
}

int main(int argc, char *argv[]) {
    const char *text = argc > 1 ? argv[1] : "";
    switch (hash(text)) {
    case HASH('f', 'i', 'r', 's', 't'):
        puts(text);
        break;
    case HASH('s', 'e', 'c', 'o', 'n', 'd'):
        puts(text);
        break;
    case HASH('t', 'h', 'i', 'r', 'd'):
        puts(text);
        break;
    default:
        puts("oops");
        break;
    }
}
Henri Menke
  • 10,705
  • 1
  • 24
  • 42
8

If you use an inline function and compile your code with optimizations, a decent compiler should be able to apply constant propagation to your code. Here's a small example:

const int basis = 17;
inline const int hash(const char* text, int last_value) {
  return *text ? hash(text + 1, (*text ^ last_value) * 11) : last_value;
}

int main(int argc, const char** argv) {
  if (hash(argv[0], basis) == hash("hello", basis)) {
    return 0;
  } else {
    return 1;
  }
}

If invoked with the -O3 flag, clang will optimize away the call to hash("hello", basis) and replace it with a constant statically. You can see that optimization if you generate the LLVM byte code (clang -S -emit-llvm example.c):

; (...)
  %18 = icmp ne i32 %14, 20068367
  %19 = zext i1 %18 to i32
  br label %20
; (...)

Unfortunately, this doesn't mean you can use a call to hash as an actual constant expression within your code, as there's no way to tell the compiler hash is necessarily statically optimizable. So for instance, you can't use it as the value of a switch-case. For these particular use cases (no pun intended), you've got no choice but using pre-computed constants (i.e. Lundin's suggestion).

This might not be as hard as you think, depending on how complex are your constexprs. There are countless C parsers written in various scripting languages (e.g. pycparser for Python). Then all you need to do is to walk your C AST and apply whatever custom preprocessing pass(es) you see fit.

Alvae
  • 1,254
  • 12
  • 22
6

If you know the values to be hashed ahead of time then you could use gperf and generate a perfect hash? C won't play well with constexpr.

Joe
  • 7,378
  • 4
  • 37
  • 54
  • I found that tool a long time ago, and forgot what it was thanks a lot. That's awesome. The idea of creating a perfect hash function and the code to go with it is amazing. – HSchmale Dec 11 '18 at 21:08
5

Is there a way to get constexpr option turned on with C?

No, no such thing exists in C.

Is there a way to get implicit void* casting turned on with C++?

No, C++ has mandatory type safety of pointers.

Is there another clean way to code this in C11/C99 that doesn't require recalculating hashes?

The only way you can do it, is the traditional way with macros. In case you create a function-like macro with those parameters, and only use it on compile-time constants, then all computations will be done at compile-time. Unfortunately, the code will turn rather ugly, but there is no way to avoid that in C.

The best way might be to prepare all such compile-time parameters with an external script/program, then just store them as raw data tables in the C program.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • Thanks. I was hoping that one of the compliers had a switch extended the kindness of permitting 'constexpr' in C beyond C99/11 support. I see no downside of accessing 'constexpr' from C since it is essentially a pre-processor compiler directive. It is silly that such features are not part of the C spec. I would love to see GCC or Clang offer such a switch – Troy Harvey Apr 09 '14 at 16:06
4

That isnt going to work in C. The values of the case labels must be constant.

What you could do is pre-calculate the output for cHash("first") etc, and then use the value in the case, eg:

#define CHASH_FIRST 0x831928 /* precalculated output for cHash ("first") */

switch (Hash(text))
{
   case CHASH_FIRST:
     break;

}

To extend this you could then build another binary that just calculates the values for your hashes, run this as part of your build process, and use the values generated as pre-processor defines on your compile line using.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
mjs
  • 2,837
  • 4
  • 28
  • 48
  • 1
    I was hoping for something I missed... This certainly works. But it requires meta-programming to insure consistency in the code. It is unfortunate that 'constexpr' isn't extended to C, since it gives some level of built-in meta programming, and a replacement for much of the non-error checked macro-maddness that we all have to use. – Troy Harvey Apr 09 '14 at 16:01
  • The thing is though, although they look similar, C and C++ are not the same language. If you want to use a C++ feature, use C++. There is no reason you can't use both either. – mjs Apr 09 '14 at 16:04
  • 2
    Agreed. But because of the snail-pace of C improvements, it has been historic that compilers (notably GCC) allowed access to features that didn't have any other significant impacts on the language. For example, many of us were using //-style comments a decade before it made it into C99. 'constexpr' seems to have do downside as an addition to C, and would greatly benefit the quality of code since C more than any language relies on macros... many of which could be replaced by 'constexpr' and get the benefits of compiler error-checking. – Troy Harvey Apr 09 '14 at 16:13