0

I have these little functions that takes a small char array and just recasts it as an int or longlong:

#define HASH4(x) (*((int*)x))
#define HASH8(x) (*((longlong*)x))

int aValue=HASH4("FOOD");
longlong aBigValue=HASH8("SEXYCOOL");

I want to know if there's any way in the C/C++ syntax for me to do this WITHOUT needing the #define? The reason is, I want to use these in a case structure, like so:

switch(something)
{
case HASH4("FOOD"): printf("Food!");break;
case HASH8("SEXYCOOL"): printf("Sexycool!");break;
}

...which, it will not let me do.

So is there some way in c syntax to tell it to interpret this four-byte char* as an int? Or, alternatively, some way to write the define that converts it?

Clarifying: I want to know if there's a way in C/C++ syntax to take the four bytes so that these two statements would be equivalent:

int something=Magic("\0\0\0A");
int something=65;

switch(stuff)
{
    case Magic("FOOD"): <-- becomes valid
}

...cuz we all know that const char "FOOD" is just four bytes containing 70,79,79,68... is there some way to wrap up a nice MAKEINT(4 bytes) that gets completely handled by the preprocessor or otherwise is indistinguishable from an int or long long?

KiraHoneybee
  • 495
  • 3
  • 12
  • Don't use #define for this, look into constexpr functions in C++. (Macros are kind of a last resource kind of thing). https://en.cppreference.com/w/cpp/language/constexpr. And why are you still using printf (use std::cout with std::format) – Pepijn Kramer Mar 12 '23 at 14:27
  • I just want to take the four (or 8) bytes as an int at the compile-time level. Endian-ness won't matter because it's not being stored-- I'm just trying to get a unique-int that I can use for case structures that can have some readability. I'm coding in C++. – KiraHoneybee Mar 12 '23 at 14:32
  • 3
    Do you look for something like this? `#define HASH4(s) ((((s)[0]*256+(s)[1])*256+(s)[2])*256+(s)[3])` (macros are dangerous, if the argument has side effects, of course ...) – chtz Mar 12 '23 at 14:33
  • Hey @chtz that works! If you want to post it as an answer I'll accept it. – KiraHoneybee Mar 12 '23 at 15:09
  • @KiraHoneybee That wouldn't answer the question. Your question says "any way in the C/C++ syntax for me to do this WITHOUT needing the #define?" – Nicol Bolas Mar 12 '23 at 15:13
  • KiraHoneybee, [I posted @chtz's macro as an answer](https://stackoverflow.com/a/75714049/4561887). @NicolBolas, I did it without a `#define` too. – Gabriel Staples Mar 12 '23 at 15:21
  • @NicolBolas In the question I had said "alternatively, if there's some way to write a define that converts it." chtz's trick worked for me, I'm willing to accept his answer. – KiraHoneybee Mar 12 '23 at 15:26
  • @GabrielStaples I'm going to give chtz some time to come back and answer it if he wants to. If he doesn't in a day I'll accept your answer. – KiraHoneybee Mar 12 '23 at 15:27
  • Re: `"FOOD"` is four bytes containing ..." -- no, it's **five** chars; there's a terminating nul character. And we don't **know** the values. We might assume that the character encoding is ASCII, which it almost always is these days. – Pete Becker Mar 12 '23 at 15:53
  • 2
    Let's not split hairs-- I just want to quickly and easily hash a 4 (or 8) char* character string into an int, the trailing zero doesn't matter for my purposes, which is only to have human readable superfast values that I can branch on. – KiraHoneybee Mar 12 '23 at 16:57
  • @chtz, great job on that. Note that for little-endian systems, including x86-64, you need to reverse the order of the hash, and do this instead: `#define HASH4(s) ((((s)[3]*256+(s)[2])*256+(s)[1])*256+(s)[0])`. See full example runs and conversions from string to an unsigned int and back again [in my full answer here](https://stackoverflow.com/a/75714049/4561887). – Gabriel Staples Mar 13 '23 at 03:36

3 Answers3

2

Use constexpr functions instead of macros. e.g.

#include <string_view>
#include <utility>
#include <array>
#include <iostream>

// create a constexpr hash function which may
// be used both at compile and at runtime.
// I used std::string_view since I am used to that.
// but you can also use a char* and use strlen

static constexpr auto hash(const std::string_view& str)
{
    std::array<std::size_t, 4> primes{ 1,3,5,7 };
    std::size_t count = std::min(primes.size(), str.size());

    std::size_t hash{ 0ul };
    for (std::size_t n = 0; n < count; n++)
    {
        hash += (primes[n] * static_cast<std::size_t>(str[n]));
    }

    return hash;
}

int main()
{
    std::size_t something = hash("FOOD");

    switch (something)
    {
        case hash("FOOD"): std::cout << "Food!"; break;
        case hash("SEXYCOOL"): std::cout << "Sexycool!"; break;
    }

    return 0;
}
Pepijn Kramer
  • 9,356
  • 2
  • 8
  • 19
  • If you (essentially) only want to accept string literals, you could use `char const str[n]` with a template parameter `int n`. – chtz Mar 12 '23 at 14:40
  • This looks promising! But I am limited to C++11... I changed it to char const str[4] as suggested, but my switch structure still tells me "case expression not constant". Any suggestions? – KiraHoneybee Mar 12 '23 at 14:44
  • Consider switching to C++20. There's a lot you can do without macros and explicit type punning. – Red.Wave Mar 12 '23 at 17:33
2

Update: @chtz's hack totally works. It tricks the compiler into not realizing it's building an int from a char array.

Solution 1/4: use a macro to hack together a uint32_t by manually calculating it from 4 bytes

Update 2: consider endianness. x86-64 systems are little-endian. I originally mistakenly used the big-endian hash:

// For big-endian byte ordering
uint32_t num = ((chars[0]*256 + chars[1])*256 + chars[2])*256 + chars[3];

// Update 2: reverse the order for correct endianness:
// For little-endian byte ordering
uint32_t num = ((chars[3]*256 + chars[2])*256 + chars[1])*256 + chars[0];

test.cpp:

///usr/bin/env ccache g++ -Wall -Wextra -Werror -O3 -std=gnu++17 "$0" -o /tmp/a && /tmp/a "$@"; exit
// For the line just above, see my answer here: https://stackoverflow.com/a/75491834/4561887

#include <iostream>

#define HASH4(s) ((((s)[0]*256+(s)[1])*256+(s)[2])*256+(s)[3])

void check_int(int i)
{
    switch(i)
    {
    case HASH4("FOOD"):
        printf("FOOD\n");
        break;
    case HASH4("TREE"):
        printf("TREE\n");
        break;
    }
}

int main()
{
    std::cout << "Test\n";

    int something = HASH4("FOOD");
    printf("something = %i\n", something); // something = 1179602756
    check_int(something);

    something = 1179602756;
    check_int(something);


    // ----------------------------
    // withOUT using a #define now
    // ----------------------------

    something = ((('F'*256+'O')*256+'O')*256+'D');

    switch(something)
    {
        case ((('F'*256+'O')*256+'O')*256+'D'):
            printf("FOOD\n");
            break;
    }


    return 0;
}

Run cmd:

chmod +x test.cpp   # make executable
./test.cpp          # run it

Output:

Test
something = 1179602756
FOOD
FOOD
FOOD

Without using a #define: this works fine because ((('F'*256+'O')*256+'O')*256+'D') is a constant expression!--it is totally calculated into a constant value at compile-time.

Solution 2/4 (better): use a constant expression function hack instead of the macro hack above

@Pepijn Kramer is right that constexpr functions can be used to replace macros which just do pre-compile-time calculations. In other words, constexpr functions can replace some macros. constexpr functions may be preferred because they have type safety and checking and avoid the double-evaluation problem that macros have when you pass an expression or assignment into them.

constexpr functions will evaluate into constexpr results during compile-time if able, and as regular results during runtime otherwise. So, they are like a mix of the functionality of some macros + regular functions.

Here's one solution, passing a std::array of 4 chars into a constexpr function:

///usr/bin/env ccache g++ -Wall -Wextra -Werror -O3 -std=gnu++17 "$0" -o /tmp/a && /tmp/a "$@"; exit
// For the line just above, see my answer here: https://stackoverflow.com/a/75491834/4561887

#include <array>
#include <iostream>

constexpr uint32_t hash4chars(const std::array<char, 4>& chars)
{
    // For big-endian byte ordering
    // uint32_t num = ((chars[0]*256 + chars[1])*256 + chars[2])*256 + chars[3];

    // Update: reverse the order for correct endianness:
    // For little-endian byte ordering
    uint32_t num = ((chars[3]*256 + chars[2])*256 + chars[1])*256 + chars[0];
    return num;
}

void check_int(int i)
{
    switch(i)
    {
    case hash4chars({'F', 'O', 'O', 'D'}):
        printf("FOOD\n");
        break;
    case hash4chars({'T', 'R', 'E', 'E'}):
        printf("TREE\n");
        break;
    }
}

int main()
{
    std::cout << "Test\n";

    uint32_t num = hash4chars({'F', 'O', 'O', 'D'});
    printf("num = %u\n", num);
    check_int(num);

    // convert the num back to a char array to check that it was converted
    // correctly
    const char* str = (const char*)(&num);
    printf("%c%c%c%c\n", str[0], str[1], str[2], str[3]);

    return 0;
}

Run and output, showing that the 4 bytes in FOOD turn into the uint32_t number of 1146048326, and that number turns back into the 4 chars FOOD on my x86-64 Linux system (which is little endian):

$ ./test.cpp 
Test
num = 1146048326
FOOD
FOOD

Solution 3/4: (best so far) constexpr function hack using a std::string_view as input, instead of the std::array just above

Even better still, use a std::string_view as the input parameter so you can still pass in raw C-string to it. Here is a full example:

///usr/bin/env ccache g++ -Wall -Wextra -Werror -O3 -std=gnu++17 "$0" -o /tmp/a && /tmp/a "$@"; exit
// For the line just above, see my answer here: https://stackoverflow.com/a/75491834/4561887

#include <cstdint>
#include <iostream>
#include <string_view>

constexpr uint32_t hash4chars(const std::string_view& sv)
{
    // Error checking: ensure all inputs have only 4 chars.
    // Note: as really crude error checking, we'll just return the sentinel
    // value of `UINT32_MAX` if this error occurs. Better techniques exist
    if (sv.size() != 4)
    {
        printf("Error: the string view should be 4 chars long!\n");
        return UINT32_MAX;
    }

    // static_assert(sv.size() == 4); // doesn't work

    // For big-endian byte ordering
    // uint32_t num = ((sv[0]*256 + sv[1])*256 + sv[2])*256 + sv[3];

    // Update: reverse the order for correct endianness:
    // For little-endian byte ordering
    uint32_t num = ((sv[3]*256 + sv[2])*256 + sv[1])*256 + sv[0];
    return num;
}

void check_int(int i)
{
    switch(i)
    {
    case hash4chars("FOOD"):
        printf("FOOD\n");
        break;
    case hash4chars("TREE"):
        printf("TREE\n");
        break;
    }
}

int main()
{
    std::cout << "Test\n";

    uint32_t num = hash4chars("FOOD");
    printf("num = %u\n", num);
    check_int(num);

    // convert the num back to a char array to check that it was converted
    // correctly
    const char* str = (const char*)(&num);
    printf("%c%c%c%c\n", str[0], str[1], str[2], str[3]);

    return 0;
}

Run and output (exact same as previously):

$ ./test.cpp 
Test
num = 1146048326
FOOD
FOOD

Solution 4/4: don't convert 4 bytes to integers; just hash the string directly, as a string view, using built-in C++ hash functions

Based on the fact you are calling your macro HASH4() and HASH8() in the question, it seems you really just want a unique or near-unique hash of the input string? ie: you don't actually need to convert its equivalent-space integer representation; rather, you just need a hash of it.

In that case, you can also just use C++'s built-in std::hash<>{}() functor. See here:

  1. https://en.cppreference.com/w/cpp/utility/hash - general documentation
    1. https://en.cppreference.com/w/cpp/string/basic_string_view/hash - documentation on the std::string_view specialization of it
      1. https://en.cppreference.com/w/cpp/string/basic_string_view/operator%22%22sv - meaning of operator""sv() function, used as "my_string"sv to produce a std::string_view from C-string "my_string" in the examples just above

But, std::hash<>{}() is not a constexpr function, so you can not use it in switch cases either! Rather, you must use the if else style of checking.

How to read std::hash<>{}():

  1. std is the namespace
  2. <> specifies the template type
  3. {} constructs a default object of this class type
  4. () calls the operator() (parenthesis function-like [or "functor"] operator; see here and here) on this object, which in this case is the function to perform the hash on the parameters inside those parenthesis.

Note: the following code works great, and may be the most beloved by many C++ people, but I find it pretty complicated and perhaps too "C++"-y. Your call. It also isn't a constexpr expression. I'm happy I have finally reached the point after 3 years of daily C++ usage that I can even read and write this myself, however, and having access to a quick hash of C-strings (interpreted as std::string_views) is in fact nice to have as part of the C++ language.

///usr/bin/env ccache g++ -Wall -Wextra -Werror -O3 -std=gnu++17 "$0" -o /tmp/a && /tmp/a "$@"; exit
// For the line just above, see my answer here: https://stackoverflow.com/a/75491834/4561887

#include <iostream>
#include <string_view>

void check_hash(std::size_t hash)
{
    if (hash == std::hash<std::string_view>{}(std::string_view{"FOOD", 4}))
    {
        printf("FOOD\n");
    }
    else if (hash == std::hash<std::string_view>{}(std::string_view{"TREE", 4}))
    {
        printf("TREE\n");
    }
}

int main()
{
    std::cout << "Test\n";

    std::size_t num
        = std::hash<std::string_view>{}(std::string_view{"FOOD", 4});
    printf("num = %lu\n", num);
    check_hash(num);

    return 0;
}

Run and output:

$ ./test.cpp 
Test
num = 16736621008042147638
FOOD

That std::hash functor is pretty ugly, so you if you like, you can beautify it a bit by wrapping it with a macro:

#define HASH(string, num_chars) \
    std::hash<std::string_view>{}(std::string_view{(string), (num_chars)})

Example:

#include <iostream>
#include <string_view>

#define HASH(string, num_chars) \
    std::hash<std::string_view>{}(std::string_view{(string), (num_chars)})

void check_hash(std::size_t hash)
{
    if (hash == HASH("FOOD", 4))
    {
        printf("FOOD\n");
    }
    else if (hash == HASH("TREE", 4))
    {
        printf("TREE\n");
    }
}

int main()
{
    std::cout << "Test\n";

    std::size_t num = HASH("FOOD", 4);
    printf("num = %lu\n", num);
    check_hash(num);

    return 0;
}

The output is the same as just above.

Going further

If you want to look more into conversions of memory blobs to and from byte arrays, see also my other answers here:

  1. How to convert a struct variable to uint8_t array in C:
    1. Answer 1/3: use a union and a packed struct
    2. Answer 2/3: convert a struct to an array of bytes via manual bit-shifting
    3. Answer 3/3: use a packed struct and a raw uint8_t pointer to it

Other info to consider and understand

To make 4 bytes get interpreted as a constant 4-byte int (const int32_t), simply use

// this
#define CONST_INT32(bytes) (*((const int32_t*)(bytes)))

// instead of this
#define CONST_INT32(bytes) (*((int32_t*)(bytes)))

ie: add const before your pointer cast.

But, that gets you a const int32_t, which is not the same as a constexpr int32_t constant expression int32_t. A constant expression tells the compiler that this piece of memory won't be trifled with, edited, or reinterpret-casted as another type. The fact that you are reinterpret-casting 4 bytes into an int via a macro already violates this.

So, no, in C++ there is no preprocessor macro way I am aware of to forcefully interpret 4 bytes as a constexpr int.

You can reinterpret 4 bytes as a const int instead, but that's not the same thing. Only constexpr types can be used as cases in a switch statement, so @dbush's answer is right. Use an if else to check the const int values instead.

Note: if you declare a const int, the compiler may see it could also be a constexpr int and make that decision for you. So, this runs:

#include <iostream>

int main()
{
    std::cout << "Test\n";

    const int CASE1 = 7; // compiler sees these could also be constexpr
    const int CASE2 = 8; // compiler sees these could also be constexpr

    int something = CASE1;

    switch(something)
    {
    case CASE1:
        printf("CASE1\n");
        break;
    case CASE2:
        printf("CASE2\n");
        break;
    }

    return 0;
}

...as well as this:

#include <iostream>

int main()
{
    std::cout << "Test\n";

    constexpr int CASE1 = 7; // you are explicitly making these constexpr
    constexpr int CASE2 = 8; // you are explicitly making these constexpr

    int something = CASE1;

    switch(something)
    {
    case CASE1:
        printf("CASE1\n");
        break;
    case CASE2:
        printf("CASE2\n");
        break;
    }

    return 0;
}
Gabriel Staples
  • 36,492
  • 15
  • 194
  • 265
1

The values for a case label are required to be constant expressions, and expressions such as (*((int*)x)) don't qualify. The fact that you're using a #define doesn't matter.

You'll need to use a if...else chain for this.

if (something == HASH4("FOOD")) {
    printf("Food!");
} else if (something == HASH8("SEXYCOOL")) {
    printf("Sexycool!");
}
dbush
  • 205,898
  • 23
  • 218
  • 273
  • I get that. The question is whether there is some way, in C++ syntax (or with a clever #define) to have it interpret the four bytes in a const char* as a const integer instead of a string. – KiraHoneybee Mar 12 '23 at 14:23