3

Considering I have the following program that determines the size of multibyte characters.

#include<iostream>

int main()
{
   std::cout<<"size of multibyte characters : "<<sizeof('ab')<<std::endl;
}

My GCC compiler gives an output of 4.

So I have the following questions:

  • What is the size of multibyte characters literal?
  • Is sizeof('ab') equal to sizeof(int)?
msc
  • 33,420
  • 29
  • 119
  • 214
  • 5
    @DimChtz: It’s actually allowed to, unfortunately – Ry- Nov 05 '17 at 19:21
  • 2
    You've stumbled upon an obscure feature of the language called a [multicharacter literal](http://en.cppreference.com/w/cpp/language/character_literal) ( see bullet `(6)`). These are indeed of type `int`. They have nothing to do with multibyte character encodings (like UTF-8 or Shift-JIS) – Igor Tandetnik Nov 05 '17 at 19:24
  • 2
    See (6): http://en.cppreference.com/w/cpp/language/character_literal – Richard Critten Nov 05 '17 at 19:24

1 Answers1

13

This is a so-called multicharacter literal, which unlike its single character counterpart, is not of type char, but of type int (assuming its supported). As specified in [lex.ccon]/2, emphasis mine:

A character literal that does not begin with u8, u, U, or L is an ordinary character literal. An ordinary character literal that contains a single c-char representable in the execution character set has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal, or an ordinary character literal containing a single c-char not representable in the execution character set, is conditionally-supported, has type int, and has an implementation-defined value.

So you print sizeof(int), as you suspected.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458