-1

What is happening here?

#include <stdio.h>
int main (void)
{
  int x = 'HELL';
  printf("%d\n", x);
  return 0;
}

Prints 1212501068

I expected a compiling error.

Explanations are welcome =)

Charles
  • 50,943
  • 13
  • 104
  • 142
Pol0nium
  • 1,346
  • 4
  • 15
  • 31
  • 7
    So, show us something strange. We're waiting. – Hot Licks Sep 07 '13 at 19:44
  • Haha, you are fun @HotLicks – Pol0nium Sep 07 '13 at 19:46
  • Pol0nium, I think that @HotLicks's point was that "Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and **the expected results**." Unless you say why `1212501068` is an unexpected result, it's not clear what's supposed to be explained. Granted, you're not asking for code, but the point about the expected results is still a good rule of thumb. – Joshua Taylor Sep 07 '13 at 20:09
  • I've seen this question asked already, but I can't find the duplicate. Anyway, I don't understand why answers to such pointless questions get up-voted so much. – sashoalm Sep 07 '13 at 20:13
  • @sashoalm answers don't get up-voted for being something not answered elsewhere, they get up-voted for being correct. If the question is over-asked, it should be easy enough to get it closed as a dup (though if east answers for undeserved up-votes is your concern, I don't think a closed question prevents credit for the votes). – mah Sep 07 '13 at 20:20
  • You say you expected a compiler warning, but what command line switches did you pass to the compiler? – Andy Lester Sep 09 '13 at 03:28

6 Answers6

13

1212501068 in hex is 0x48454c4c.

  • 0x48 is the ASCII code for H.
  • 0x45 is the ASCII code for E.
  • 0x4c is the ASCII code for L.
  • 0x4c is the ASCII code for L.

Note that this behaviour is implementation-defined and therefore not portable. A good compiler would issue a warning:

$ gcc test.c
test.c: In function 'main':
test.c:4:11: warning: multi-character character constant [-Wmultichar]
NPE
  • 486,780
  • 108
  • 951
  • 1,012
9

In C, single quotes are used to denote characters, which are represented in memory by numbers. When you place multiple characters in single quotes, the compiler combines them in a single value however it wants, as long as it documents the process.

Looking at your number, 1212501068 is 0x48454C4C. If you decompose this number into bytes, you get 48 or 'H', 45 or 'E' and twice 4C or 'L'

3Doubloons
  • 2,088
  • 14
  • 26
  • 2
    "..does its best to combine them in a single value". This is optimistic. Since the behavior is implementation-dependent, the compiler could just choose to map every "multichar" constant to the value `-2` (thus not very useful). The only constraint is that it **must document its behavior**. – LorenzoDonati4Ukraine-OnStrike Sep 07 '13 at 19:59
3

The output of 1212501068 as hex is: 0x48 0x45 0x4C 0x4C

Look it up in an ASCII table, and you'll see those are the code for HELL.

BTW: single-quotes around a multi-char value are not standardized.
The exact interpretation of single-quotes around multiple characters is Implementation-Defined. But it is very common that it either comes out as a Big-Endian or Little-Endian integer. (Technically, the implementation could interpret it any way it chooses, including a random value).

In otherwords, depending on the platform, I would not be surprised to see it come out as:
0x4C 0x4C 0x45 0x48, or 1280066888

And over on this question, and also on this site you can see practical uses of this behavior.

Community
  • 1
  • 1
abelenky
  • 63,815
  • 23
  • 109
  • 159
3

Others have explained what happened. As for the explanation, I quote from C99 draft standard (N1256):

6.4.4.4 Character constants

[...]

An integer character constant has type int. The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer. The value of an integer character constant containing more than one character (e.g.,'ab'), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined. If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

The emphasis on the relevant sentence is mine.

1

Line:

int x = 'HELL';

save to memory hex values of 'HELL' and it is 0x48454c4c == 1212501068.

Michal
  • 3,584
  • 9
  • 47
  • 74
  • This is an obscure feature of the language, and many books on it may overlook the feature completely. Even for those books that happen to address it, other than reading them cover to cover, how would you suggest locating the feature in the book's index? – mah Sep 07 '13 at 19:53
  • @mah Look under "character constant" because that's what it is called in the C standard. – Jens Sep 07 '13 at 20:00
  • @Jens I looked in the index of two C/C++ texts from my university days, neither of which listed "character constant". I also looked at the Amazon link posted by MichaelChovanec and used Amazon's "Look Inside" feature to go to its index... missing there too. The point of my first comment was that being part of a language standard is in no way meaning all (or most) good books on the language will document it -- especially with obscure features like this. Your comment (that is, the results when testing it) seem to reinforce that point. – mah Sep 07 '13 at 20:18
  • Why recommend C++ books for a question about C? – Keith Thompson Sep 08 '13 at 21:22
0

The value is just 'HELL' interpreted as an int (usually 4 bytes).

If you try this:

#include <stdio.h>

int main (void)
{
    union {
        int x;
        char c[4];
    } u;
    int i;

    u.x = 'HELL';
    printf("%d\n", u.x);
    for(i=0; i<4; i++) {
        printf("'%c' %x\n", u.c[i], u.c[i]);
    }
    return 0;
}

You'll get:

1212501068
'L' 4c
'L' 4c
'E' 45
'H' 48
Emmet
  • 6,192
  • 26
  • 39
  • The value is implementation-defined. -1 for not mentioning that. – Keith Thompson Sep 08 '13 at 21:24
  • @KeithThompson: if you're going to downvote every answer that doesn't cite chapter and verse, you're going to have your work cut out for you. – Emmet Sep 10 '13 at 14:22
  • I don't. The fact that the value of `'ABCD'` is implementation-defined is the most important thing to know about it. Someone relying on your answer would assume that it's portable. – Keith Thompson Sep 10 '13 at 18:23
  • Someone relying on my answer without heeding compiler warnings deserves everything they get. – Emmet Sep 13 '13 at 02:32
  • Compilers are not obliged to warn about multicharacter constants. Are you really suggesting that it's ok to give incomplete and misleading answers because the compiler will warn about it? – Keith Thompson Sep 13 '13 at 04:34
  • My answer provides a compact code example that illustrates perfectly clearly how the value that he observed comes about, which was (IMHO) missing in other answers, and thus contributes to the reader's understanding of the issue. At the time of writing, there was *already* a comment on another answer that the multi-character constant was implementation-defined, so how would my repeating that again make any useful contribution? In all honesty, I thought I was a hell of a pedant, but you take the cake. – Emmet Sep 13 '13 at 22:04
  • IMHO each answer should stand on its own, unless it explicitly refers to other answers. (Thanks for the cake, it was yummy!) – Keith Thompson Sep 13 '13 at 22:07
  • It's better with cream cheese on it. – Emmet Sep 13 '13 at 22:07