284

I'm curious about this code:

cout << 'test'; // Note the single quotes.

gives me an output of 1952805748.

My question: Is the output an address in memory or something?

Tamara Wijsman
  • 12,198
  • 8
  • 53
  • 82
lucidreality
  • 2,587
  • 2
  • 14
  • 4
  • 10
    Pay attention that the actual value is implementation defined http://stackoverflow.com/questions/3960954/c-multicharacter-literal – FireAphis Sep 18 '11 at 07:46

5 Answers5

287

It's a multi-character literal. 1952805748 is 0x74657374, which decomposes as

0x74 -> 't'
0x65 -> 'e'
0x73 -> 's'
0x74 -> 't'

Edit:

C++ standard, §2.14.3/1 - Character literals

(...) An ordinary character literal that contains more than one c-char is a multicharacter literal . A multicharacter literal has type int and implementation-defined value.

Community
  • 1
  • 1
K-ballo
  • 80,396
  • 20
  • 159
  • 169
  • 11
    You did not mention that this is implementation defined. – Andreas Bonini Sep 18 '11 at 15:23
  • 2
    I suppose the funnest thing about that definition is that `sizeof(int)` is implementation defined as well. So not only is storage order implementation defined, but the maximum length of these is as well. – bobobobo Dec 28 '13 at 16:32
77

No, it's not an address. It's the so-called multibyte character.

Typically, it's the ASCII values of the four characters combined.

't' == 0x74; 'e' == 0x65; 's' == 0x73; 't' == 0x74; 

So 0x74657374 is 1952805748.

But it can also be 0x74736574 on some other compiler. The C and C++ standards both say the value of multibyte characters is implementation defined. So generally its use is strongly discouraged.

Marco A.
  • 43,032
  • 26
  • 132
  • 246
chys
  • 1,546
  • 13
  • 17
  • Is the length of such a multi-byte character constrained to 4 bytes? I.e. does it represent an int written out as characters? – Giorgio Sep 18 '11 at 08:47
  • 2
    @Giorgio: The standard only says it's implementation defined, with no more details. In practice, since `int` is 4 bytes on most machines, I don't think it makes sense to use more than 4 bytes. Yes, it was intended to be a convenient way to write some constants, but unfortunately different compilers have been interpreting it differently, so nowadays most coding styles discourage its use. – chys Sep 18 '11 at 08:52
  • 2
    @chys: And the fact that it's implementation-defined means it's not even required to be consistent. A conforming compiler could give all multicharacter literals the value 0, for example (though that would be unfriendly). – Keith Thompson Sep 18 '11 at 19:28
  • 2
    One has to ask why this loony feature exists in the standard. It seems like such a rare use case, is implementation defined anyway, and can be done quite clearly with ordinary bit shifting and or'ing if needed. – Boann Feb 23 '13 at 02:11
  • 1
    @Boann _Yes_, my sentiments exactly. But you can safely use it in switches and whatnot, as direct comparison for `==` should check out – bobobobo Dec 21 '13 at 02:07
18

An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal has type int and implementation-defined value.

Implementation defined behavior is required to be documented by the implementation. for example in gcc you can find it here

The compiler values a multi-character character constant a character at a time, shifting the previous value left by the number of bits per target character, and then or-ing in the bit-pattern of the new character truncated to the width of a target character. The final bit-pattern is given type int, and is therefore signed, regardless of whether single characters are signed or not.

Check the explanation in this page for more details

K-ballo
  • 80,396
  • 20
  • 159
  • 169
Mouna Cheikhna
  • 38,870
  • 10
  • 48
  • 69
10

They're really just ints. They're used extensively in the Core Audio API enum's for example, in the CoreAudioTypes.h header file,

enum
{
    kAudioFormatLinearPCM               = 'lpcm',
    kAudioFormatAC3                     = 'ac-3',
    kAudioFormat60958AC3                = 'cac3',
    kAudioFormatAppleIMA4               = 'ima4',
    kAudioFormatMPEG4AAC                = 'aac ',
    kAudioFormatMPEG4CELP               = 'celp',
} ;

There's a lot of chatter about this not being "platform independent", but when you're using an api that's made for a specific platform, who cares about portability. Checking for equality on the same platform will never fail. These enum'd values are easier to read and they actually contain their identity in their value, which is pretty nice.

What I've tried to do below is wrap a multibyte character literal up so it can be printed (on Mac this works). The strange thing is, if you don't use up all 4 characters, the result becomes wrong below..

#include <stdio.h>

#define MASK(x,BYTEX) ((x&(0xff<<8*BYTEX))>>(8*BYTEX))

struct Multibyte
{
  union{
    int val ;
    char vals[4];
  };

  Multibyte() : val(0) { }
  Multibyte( int in )
  {
    vals[0] = MASK(in,3);
    vals[1] = MASK(in,2);
    vals[2] = MASK(in,1);
    vals[3] = MASK(in,0);
  }
  char operator[]( int i ) {
    return val >> (3-i)*8 ; // works on mac
    //return val>>i*8 ; // might work on other systems
  }

  void println()
  {
    for( int i = 0 ; i < 4 ; i++ )
      putc( vals[i], stdout ) ;
    puts( "" ) ;
  }
} ;

int main(int argc, const char * argv[])
{
  Multibyte( 'abcd' ).println() ;  
  Multibyte( 'x097' ).println() ;
  Multibyte( '\"\\\'\'' ).println() ;
  Multibyte( '/*|' ).println() ;
  Multibyte( 'd' ).println() ;

  return 0;
}
bobobobo
  • 64,917
  • 62
  • 258
  • 363
  • 7
    _"Checking for equality on the same platform will never fail."_ It might. Upgrade to Visual Studio _xyz_ and bite your tongue. This library has made a _terrible_ decision. – Lightness Races in Orbit Jul 31 '15 at 10:08
  • @LightnessRacesinOrbit *"Upgrade to Visual Studio xyz and bite your tongue."* Core Audio API is OS X's system audio API so this is not relevant. – Jean-Michaël Celerier Jul 16 '16 at 13:55
  • 7
    @Jean-MichaëlCelerier: Fine; upgrade your OSX Clang version and bite your tongue... – Lightness Races in Orbit Jul 16 '16 at 17:18
  • 1
    @LightnessRacesinOrbit Or just use a different compiler altogether. The behavior is *compiler-dependent*, not *platform-dependent*. A platform dependency would be assuming that in the default environment, `$HOME` always stores a value that begins with `/Users/`. If the library is always compiled at the same time as its dependencies, it's not a terrible idea (just a bad one), but the binary format persists for someone to take a dependency on, this is a nightmare waiting to happen. – MooseBoys Jul 06 '22 at 18:13
1

This kind of feature is really good when you are building parsers. Consider this:

byte* buffer = ...;
if(*(int*)buffer == 'GET ')
  invoke_get_method(buffer+4);

This code will likely only work on specific endianess and might break across different compilers

Ayende Rahien
  • 22,925
  • 1
  • 36
  • 41