2

Found a bug the code below, it parses C-string and is supposed to detect UTF8 characters:

char* pTmp = ...;
...
switch (*pTmp)
{
    case 'o':
    {
        ...     // works fine
        break;
    }   
    case 0xC2:
    {
        ...     // never gets triggered
        break;
    }
}

However case 0xC2: is never triggered.

My assumption is that 0xC2 is considered an int and therefore is 194 which is bigger than 127, the maximum value for char data type. So -62 != 194

Or may be there is some overflow or integer promotion is happening here.

Writing switch ((unsigned char)*pTmp) fixes the issue.

But I would like to clarify what is really going on here and what rules are applied.

I'm also open to changing the title, just nothing better came up in mind.

user3386109
  • 34,287
  • 7
  • 49
  • 68
Dmitriy
  • 5,357
  • 8
  • 45
  • 57
  • `case 0xC2:` is comparing with an `int` value decimal `196` but if `char` is `signed` then it is as you say, `-62.` You can verify that `0xC2` is type `int` with `printf("%zu\n", sizeof(0xC2));`. Aside: note that `'o'` is also an `int` value. – Weather Vane Mar 27 '21 at 21:04
  • Copying and pasting from a comment [here](https://stackoverflow.com/questions/4337217/difference-between-signed-unsigned-char): "C89 6.1.2.5 "There are three character types, designated as char , signed char, and unsigned char." C11 6.2.5p15 "The three types char, signed char, and unsigned char are collectively called the character types." 6.2.5fn45 "char is a separate type from the other two and is not compatible with either"" – Karl Knechtel Mar 27 '21 at 21:17
  • I tried to give it a better title based on how you phrased the question. – Karl Knechtel Mar 27 '21 at 21:24
  • You could use a character constant, `'\xc2'`, so that it will have a character value. – Eric Postpischil Mar 27 '21 at 21:36

2 Answers2

4

I get this on adding -Wall here

warning: case label value exceeds maximum value for type [-Wswitch-outside-range]
   14 |             case 0xC2:

So yes, your reasoning is correct.

Zoso
  • 3,273
  • 1
  • 16
  • 27
3

Is char signed?

It is implementation specific if char has the same range as signed char or unsinged char. In OP's case, char has the range [-128 ... 127], thus case 0xC2: is never matched.


But I would like to clarify what is really going on here and what rules are applied.

The C standard library string functions have many parameters that are char *, yet those library functions internally act as if they are a pointing to unsigned char data

For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value). C17dr § 7.24.1 3

To match that, OP's code should do likewise. Doing so will also allow *upTmp to potentially match 0xC2.

char* pTmp = ...;
unsigned char* upTmp = ( unsigned char*) pTmp;

switch (*upTmp)

   case 0xC2:

Alterative to using hexadecimal constant 0xC2, use a character constant: '\xC2' to match the range of char. @Eric Postpischil.


[Pedantic]

"switch ((unsigned char)*pTmp) fixes the issue." - it is close enough.

This "fix" works with 2's complements signed char as well as when the implementation defined char matches unsigned char.

For the remaining all but non-existent cases where char is signed and not 2's complement, fix is wrong as the characters should be accessed via unsigned char *, else the wrong value is used.

switch (*(unsigned char *)pTmp) works correctly in all cases.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • 2
    If they are, for whatever reason, using `char` rather than `unsigned char`, they can use a character constant, as in `case '\xC2':` to get the corresponding `char` value. And using a character constant instead of an integer constant better expresses they are switching on a character value. – Eric Postpischil Mar 27 '21 at 21:41