I have this code in C:
#include<stdio.h>
int main()
{
char ch='17 December 2008';
printf("%c",ch);
}
What I expect is that it should report an error, but it shows '8' as output.
Can someone please explain why?
I have this code in C:
#include<stdio.h>
int main()
{
char ch='17 December 2008';
printf("%c",ch);
}
What I expect is that it should report an error, but it shows '8' as output.
Can someone please explain why?
The standard says:
The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
An implementation must document every implementation-defined behaviour, so you have to consult the documentation that comes with your implementation if you want to know exactly what it does. The relevant bit of the documentation for your particular implementation can be found here. Quote:
The compiler evaluates a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type
int
, and is therefore signed, regardless of whether single characters are signed or not (a slight change from versions 3.1 and earlier of GCC). If there are more characters in the constant than would fit in the targetint
the compiler issues a warning, and the excess leading characters are ignored.For example,
'ab'
for a target with an 8-bit char would be interpreted as‘(int) ((unsigned char) 'a' * 256 + (unsigned char) 'b')’
, and'\234a'
as‘(int) ((unsigned char) '\234' * 256 + (unsigned char) 'a')’.
**
It's a multi-character literal.
An ordinary character literal that contains more than one c-char is a multicharacter literal . A multicharacter literal has type int and implementation-defined value.
Also from 6.4.4.4/10 in C11 specs:
An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
If you take like of that, you get:
char ch='17 December';
printf("%c",ch);
Output:
r
And, for this, you get:
char ch='17 December 2008';
printf("%c",ch);
Output:
8
It appears on your system, least significant
8 bits
are used to assign to ch. Because your character literal is constant, this most possibly happens at compile time: (For example following happens when I compile with gcc)
Remember that the type of a single-quoted character constant is int, but you're assigning it to a char, so it has to be truncated to a single character.
Type of 'a' for example is
int in C
. (Not to be confused with'a'
inC++ which is a char
. On the other hand type of 'ab' is int in both C and C++.)
Now when you assign this int type to a char
type and value is more than that can be represented by a char, then some squeezing needs to be done to fit the result into less wider type char and the actual result is implementation-defined.
Gcc does the same. 8
is the last character before the closing quote.
C99 says:
The value of an integer character constant containing more than one character (e.g., 'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
The warning you do get is in relation the the type you are assigning it to, not the expression itself. For example:
int i = '2008';
printf("%d\n", i);
Does not create a warning and is legal:
An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in 'x'. A wide character constant is the same, except prefixed by the letter L. With a few exceptions detailed later, the elements of the sequence are any members of the source character set; they are mapped in an implementation-defined manner to members of the execution character set.
The error:
An integer character constant includes more than one character or a wide character constant includes more than one multibyte character
Is also present in the standard.
Why is the last character selected
But compilers vary, it's partly how the compiler chooses to encode the sequence and the endian of the machine. See the answer: Multiple characters in a character constant.