57

Why is this a warning? I think there are many cases when is more clear to use multi-char int constants instead of "no meaning" numbers or instead of defining const variables with same value. When parsing wave/tiff/other file types is more clear to compare the read values with some 'EVAW', 'data', etc instead of their corresponding values.

Sample code:

int waveHeader = 'EVAW';

Why does this give a warning?

sqr163
  • 1,074
  • 13
  • 24
Mircea Ispas
  • 20,260
  • 32
  • 123
  • 211
  • I am having trouble trying to understand your question. Can you make it clearer? –  Oct 13 '11 at 13:56
  • Related: http://stackoverflow.com/questions/7497192/enum-constants-questions – Steve Jessop Oct 13 '11 at 14:32
  • 1
    What seems to work and is nicely readable, but perhaps not exactly safe is to c-style-cast string literals to `int*`:  `int waveHeader = *((int*)"wave");`. — I have a more trustworthy feeling about the solution I have so far gone with: to `memcpy` the string literal into a union of int(s) and char. This introduces some overhead, but that's usually leglectable – at least if it only occurs in the file header. – leftaroundabout Oct 13 '11 at 14:54
  • On Visual Studio 2008, it doesn't seem to give a warning, and gives the same results as "int v = 'w' | 'a' << 8 | 'v' << 16 | 'e' << 24;" – Kit10 Aug 14 '13 at 18:34
  • Re-opened and converted this question to address C only, since that's what the posted answers are about. – Lundin Dec 12 '17 at 15:38
  • @Lundin the answer by "o11c" is about C++ – M.M Dec 06 '18 at 22:29
  • The dangers of using multi-character constants: https://habr.com/en/company/pvs-studio/blog/457694/ – AndreyKarpov Jun 26 '19 at 20:46

6 Answers6

55

According to the standard (§6.4.4.4/10)

The value of an integer character constant containing more than one character (e.g., 'ab'), [...] is implementation-defined.

long x = '\xde\xad\xbe\xef'; // yes, single quotes

This is valid ISO 9899:2011 C. It compiles without warning under gcc with -Wall, and a “multi-character character constant” warning with -pedantic.

From Wikipedia:

Multi-character constants (e.g. 'xy') are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into one int is not specified, portable use of multi-character constants is difficult.

For portability sake, don't use multi-character constants with integral types.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 5
    I'm not sure what that last phrase means; multi-character constants always have integral types (and such constants without a prefix always have type `int`). – Keith Thompson Sep 13 '13 at 04:37
  • 2
    It's basically saying that you can't portability decipher the byte-ordering of your int of packed chars as Little-Endian or Big-Endian, so for portability's sake don't use a datatype other than a char to store chars. In x86, an integer is saved in Little-Endian, a byte or an 8-bit char is saved as a single byte and technically has no byte ordering, a string of chars if we label the content as a single datatype, a string, is basically saved as a Big-Endian string. So is the multichar-char Big-Endian or Little-Endian? We'd need our int to save as Big-Endian in order to be correct. – GodDamn Apr 18 '21 at 22:43
24

This warning is useful for programmers that would mistakenly write 'test' where they should have written "test".

This happen much more often than programmers that do actually want multi-char int constants.

Didier Trosset
  • 36,376
  • 13
  • 83
  • 122
  • 5
    This is a good example, but what happens when I really want to write 'test' and I have an warning. I don't let any warnings in my code... – Mircea Ispas Oct 13 '11 at 14:00
  • 2
    You have to cope with the warning, or find your compiler option to disable this specific warning (that may hurt you at some other place in your code ;-) ). – Didier Trosset Oct 13 '11 at 14:02
  • 4
    Another accidental programmer error is to misremember the syntax for hex escapes and write '\0x61' when one meant '\x61'. – tml Aug 12 '13 at 09:01
  • @Felics there isn't really a good reason to want to write `'test'` as it would make your code non-portable – M.M Feb 17 '15 at 19:23
  • There ought not be a warning for: (unsigned)'test' but that's just my opinion. – Bruce K Mar 23 '15 at 17:50
17

If you're happy you know what you're doing and can accept the portability problems, on GCC for example you can disable the warning on the command line:

-Wno-multichar

I use this for my own apps to work with AVI and MP4 file headers for similar reasons to you.

blueshift
  • 6,742
  • 2
  • 39
  • 63
  • That makes your code non-portable, especially when the endianness of the platform differs from your dev machine. It seems like a poor idea. – Perry Nov 04 '21 at 15:42
  • 3
    @Perry Congratulations! Yours is the 8th mention of portability issues on this question (including one in my answer). – blueshift Nov 05 '21 at 05:37
8

Even if you're willing to look up what behavior your implementation defines, multi-character constants will still vary with endianness.

Better to use a (POD) struct { char[4] }; ... and then use a UDL like "WAVE"_4cc to easily construct instances of that class

o11c
  • 15,265
  • 4
  • 50
  • 75
  • 1
    Any chance for a quick UDL _4cc implementation example? – fuzzyTew May 11 '20 at 14:28
  • 2
    @fuzzyTew: constexpr std::uintmax_t operator ""_cc (char const * cc, std::size_t size) { std::uintmax_t val = 0; for (int i = 0; i != size; ++i) { val <<= 8; val += cc[i]; } return val; } – imix Sep 04 '20 at 17:15
7

Simplest C/C++ any compiler/standard compliant solution, was mentioned by @leftaroundabout in comments above:

int x = *(int*)"abcd";

Or a bit more specific:

int x = *(int32_t*)"abcd";

One more solution, also compliant with C/C++ compiler/standard since C99 (except clang++, which has a known bug):

int x = ((union {char s[5]; int number;}){"abcd"}).number;

/* just a demo check: */
printf("x=%d stored %s byte first\n", x, x==0x61626364 ? "MSB":"LSB");

Here anonymous union is used to give a nice symbol-name to the desired numeric result, "abcd" string is used to initialize the lvalue of compound literal (C99).

sqr163
  • 1,074
  • 13
  • 24
  • The `union` technique might not be as portable as that. This discussion points at the fact that it might be undefined behaviour in C++ [Unions and type punning](https://stackoverflow.com/questions/25664848/unions-and-type-punning) – op414 Aug 05 '20 at 14:47
  • 1
    This is **not** standards-compliant, as it breaks strict aliasing. – danielschemmel Sep 17 '20 at 09:56
  • 1
    This was my approach, but then clang complained about alignment issues, which brought me to this page... – SO_fix_the_vote_sorting_bug Aug 04 '21 at 18:13
0

If you want to disable this warning it is important to know that there are two related warning parameters in GCC and Clang: GCC Compiler options -wno-four-char-constants and -wno-multichar

Alexander Ushakov
  • 5,139
  • 3
  • 27
  • 50