2

I have read this related question, but it does not quite help me.

The goal of the Enum is to contain raw UTF-8 code (not the unicode code point) of single UTF-8 characters within the 4 byte range.

The following example works because the xcode source file is in UTF-8 format (which is the recommended encoding for xcode). It compiles and runs with the correct expected values. But I also get the warning "character constant too long for this type". Might I suppress it?.. or bad idea?

typedef enum {
    TEST_VAL_1BYTE = ',', // 0x2C
    TEST_VAL_2BYTE = '§', // 0xC2A7     (the warning)
    TEST_VAL_3BYTE = '✓', // 0xE29C93   (the warning)
    TEST_VAL_4BYTE = '', // 0xF09D8DA5 (the warning)
} TEST_VALUES_UTF8;

Safest way and without warnings, but it is more tedious to code:

typedef enum {
    NUM_VAL_1BYTE = 0x2C,       // ,
    NUM_VAL_2BYTE = 0xC2A7,     // §
    NUM_VAL_3BYTE = 0xE29C93,   // ✓
    NUM_VAL_4BYTE = 0xF09D8DA5, // 
} TEST_VALUES_UTF8;

Finally please note that enumeration with 1 or 4 ASCII characters is valid and without warnings:

enum {
    ENUM_TEST_1     = '1',     // 0x31        (no warning)
    ENUM_TEST_12    = '12',    // 0x3132      (w: multi-character character constant)
    ENUM_TEST_123   = '123',   // 0x313233    (w: multi-character character constant)
    ENUM_TEST_1234  = '1234',  // 0x31323334  (no warning)
};

Is there maybe a preprocessor macro that is source encoding generic that can return the UTF-8 code:

enum {
    TEST_VAL_2BYTE = AWESOME_UTF8CODE_MACRO('§'), // 0xC2A7
};

Thanks;

Community
  • 1
  • 1
Ivan Dossev
  • 565
  • 5
  • 11

1 Answers1

1

Use C++11 constexpr and u8 prefix, a'la http://liveworkspace.org/code/3EtxVE :

#include <iostream>
#include <cstdint>

constexpr uint32_t utf8(const char (&c)[2]) {
   return uint8_t(c[0]);
}
constexpr uint32_t utf8(const char (&c)[3]) {
   return uint8_t(c[1]) | (uint8_t(c[0])<<8);
}
constexpr uint32_t utf8(const char (&c)[4]) {
   return uint8_t(c[2]) | (uint8_t(c[1])<<8) | (uint8_t(c[0])<<16);
}
constexpr uint32_t utf8(const char (&c)[5]) {
   return uint8_t(c[3]) | (uint8_t(c[2])<<8) | (uint8_t(c[1])<<16) | (uint8_t(c[0])<<24);
}

typedef enum {
    TEST_VAL_1BYTE = utf8(u8","),
    TEST_VAL_2BYTE = utf8(u8"§"),
    TEST_VAL_3BYTE = utf8(u8"✓"),
    TEST_VAL_4BYTE = utf8(u8""),
} TEST_VALUES_UTF8;

int main() {
   std::cout << std::hex << TEST_VAL_1BYTE << std::endl;
   std::cout << std::hex << TEST_VAL_2BYTE << std::endl;
   std::cout << std::hex << TEST_VAL_3BYTE << std::endl;
   std::cout << std::hex << TEST_VAL_4BYTE << std::endl;
}

which outputs

2c
c2a7
e29c93
f09d8da5

If you don't have access to u8 prefix you can simply ensure the source file is encoded in UTF-8, and I guess you can turn the constexpr into macros if needed...but shown is a clean way.

Tino Didriksen
  • 2,215
  • 18
  • 21