0

Can I pass a negative int in printf while printing through format specifier %c since while printing int gets converted into an unsigned char? Is printf("%c", -65); valid? — I tried it on GCC but getting a diamond-like character(with question-mark inside) as output. Why?

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Which representation would you expect instead if you print value `-65`? `char` type is not only used to store printable characters. It is also a rather short type for numerical values. – Gerhardh May 07 '20 at 15:01

1 Answers1

4

Absolutely yes, if char is a signed type. C allows char to be either signed or unsigned and in GCC you can switch between them with -funsigned-char and -fsigned-char. When char is signed it's exactly the same thing as this

char c = -65;
printf("%c", c);

When passing to printf() the char variable will be sign-extended to int so printf() will also see -65 like if it's passed from a constant. printf simply has no way to differentiate between printf("%c", c); and printf("%c", -65); due to default promotion in variadic functions.

The printing result depends on the character encoding though. For example in the ISO-8859-1 or Windows-1252 charsets you'll see ¿ because (unsigned char)-65 == 0xBF. In UTF-8 (which is a variable-length encoding) 0xBF is not allowed as a character in the starting position. That's why you see � which is the replacement character for invalid bytes

Please tell me why the code point 0 to 255 are not mapped to 0 to 255 in unsigned char. I mean that they are non-negative so shouldn't I just look through the UTF-8 character set for their corresponding values?

The mapping is not done by relative position in the range as you thought, i.e. code point 0 maps to the CHAR_MIN, code point 40 maps to CHAR_MIN + 40, code point 255 maps to CHAR_MAX... In two's complement systems it's typically a simple mapping based on the value of the bit pattern when treating as unsigned. That's because the way values are usually truncated from a wider type. In C a character literal like 'a' has type int. Suppose 'a' is mapped to code point 130 in some theoretical character set then the below lines are equivalent

char c = 'a';
char c = 130;

Either way c will be assigned a value of 'a' after casting to char, i.e. (char)'a', which may be a negative value

So code points 0 to 255 are mapped to 0 to 255 in unsigned char. That means code point code point 0x1F will be stored in a char (signed or unsigned) with value 0x1F. Code point 0xBF will be mapped to 0xBF if char is unsigned and -65 if char is signed

I'm assuming 8-bit char for all the above things. Also note that UTF-8 is an encoding for the Unicode character set, it's not a charset on its own so you can't look up UTF-8 code points

phuclv
  • 37,963
  • 15
  • 156
  • 475
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/213418/discussion-on-answer-by-phuclv-can-c-be-given-a-negative-int-argument-in-printf). – Samuel Liew May 08 '20 at 13:30