3

While reviewing some WINAPI code intended to compile in MS Visual C++, I found the following (simplified):

char buf[4];

// buf gets filled ...

switch ((buf[0] << 8) + buf[1]) {
    case 'CT':
        /* ... */

    case 'SY':
        /* ... */

    default:
        break;

    }
}

Assuming 16 bit chars, I can understand why the shift of buf[0] and addition of buf[1]. What I don't gather is how the comparisons in the case clauses are intended to work.

I don't have access to Visual C++ and, of course, those yield multi-character character constant [-Wmultichar] warnings on gcc/MingW.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
AndRAM
  • 155
  • 1
  • 10
  • 2
    Multi-character constants are just `int`s with a fancy representation, e.g. `'ABCD'` is just `0x41424344`. – Paul R Nov 15 '16 at 12:16
  • [This character literal reference](http://en.cppreference.com/w/cpp/language/character_literal) should hopefully give you some insight. – Some programmer dude Nov 15 '16 at 12:17
  • @Someprogrammerdude thanks for the link, which is about c++, but led me to http://en.cppreference.com/w/c/language/character_constant. I'm adding this sites to my bookmark bar :-) – AndRAM Nov 15 '16 at 12:23
  • 2
    **About duplicate question mark:** This one is different in that it actually spells what is asking about. IMO next newbie wondering about multi-character constant validity in C will find this one right away. I certainly couldn't find the other one... – AndRAM Nov 15 '16 at 12:47
  • 1
    @PaulR: 1) There is no requirement to use ASCII encoding. 2) The standard does not guarantee endianess 3) All _character constants_ have type `int`, not just multibyte. – too honest for this site Nov 15 '16 at 13:16
  • Any reason you assume 16 bit `char`? And you invoke undefined behaviour for character codes >127 and 16 bit `int`. – too honest for this site Nov 15 '16 at 13:28
  • @Olaf, I should have written "at least 16 bit chars". I meant to convey that I understood the fact that the shifting + addition was packing 2 (at least 16 bit chars) low bytes into an (at least 16 bit) char. After all the shifting is just 8 bits. – AndRAM Nov 16 '16 at 12:11
  • If you have `char` with 16 bits, you have 16 bit bytes, too! Simply because `char` **is** a "byte"! And the operation is not done as `char`, but `int` or `unsigned int`. – too honest for this site Nov 16 '16 at 12:23
  • @olaf then what I don't understand is how the shifting works. If char is 1 byte, wouldn't the shifted 8 bits in `buf[0]` be lost? After all `buf[0]` is not a character constant (int) but a char (1 byte). – AndRAM Nov 25 '16 at 10:07
  • You cannot shift a `char` in C! Read about integer promotions. And again: 1 byte is **not** identical with "8 bits"! – too honest for this site Nov 25 '16 at 12:27
  • Thanks @olaf for guiding me in the right direction, albeit so indirectly. Now I've read the _Semantics_ paragraph in 6.5.7 of the standard. So the crux of it is: `buf[0]` is being promoted to `int` in expression `buf[0] << 8` then added to `buf[1]` yielding an `int`value; which in turn is being compared to other `ints` (character constants) in the `case` clauses. Hope i finally got it.. – AndRAM Nov 25 '16 at 13:13
  • "albeit so indirectly" - Yes, I try to make ppl think for themselves. The knowlege will be remembered much better that way. Good you did! Just go for what a byte is and what _not (necessarily)_. – too honest for this site Nov 25 '16 at 14:11

2 Answers2

4

This is a non-portable way of storing more than one chars in one int. Finally, the comparison happens as the int values, as usual.

Note: consider concatenated representation of the ASCII values for each individual char as the final int value.

Following the wiki article, (emphasis mine)

[...] Multi-character constants (e.g. 'xy') are valid, although rarely useful — they let one store several characters in an integer (e.g. 4 ASCII characters can fit in a 32-bit integer, 8 in a 64-bit one). Since the order in which the characters are packed into an int is not specified, portable use of multi-character constants is difficult.

Related, C11, chapter §6.4.4.4/p10

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. [....]

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • In other words, The value of an integer character constant containing more than one character...is implementation-defined. Which makes `int foo = 'yes';` to be valid – Michi Nov 15 '16 at 12:21
  • 2
    Thanks to both, now I understand. It's valid bud non-portable, hence gcc's warnings. So if I were to compile this with another compiler I might end comparing the equivalent of CT to TC, depending on how the compiler chooses to store 'CT'. – AndRAM Nov 15 '16 at 12:29
  • 1
    There is a portable way, at least with `sizeof(char) < sizeof(int)`: `int i = (char)'3'`. As is to store multiple `char`s in an `int` (with the given constraint): use bitshifts. And _**multibyte** character constants_ **are** `int` (not `char`) like any other _character constant_. Problem is the "packing" in the `int`. – too honest for this site Nov 15 '16 at 13:19
  • @Michi Note: a multi-character constant like `int foo = 'yes'` is problematic when `int` is 16-bit`. – chux - Reinstate Monica Nov 15 '16 at 14:29
4

Yes, they are valid and its type is int and its value is implementation dependent.

From C11 draft, 6.4.4.4p10:

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.

(emphasis added)

GCC is being cautious, and warns to let you know in case you have used it unintentionally.

P.P
  • 117,907
  • 20
  • 175
  • 238
  • A good compiler will never compile [A code like this](http://ideone.com/Hpr9Qh) Without a warning `warning: multi-character character constant [-Wmultichar]` – Michi Nov 15 '16 at 12:28