C++: implementation-defined accepted physical source file characters

Question

According to the C++14 standard,

§2.2.1.1 [...] The set of physical source file characters accepted is implementation-defined. [...] Any source file character not in the basic source character set is replaced by the universal-character-name that designates that character. [...]

Does it means that the C++ standard gives not implementation-defined or conditionally-supported support for non UCS/Unicode characters? For example, a physical source file encoding including characters without corresponding UCS code point.

The only think I can think of is, if that were the case (the compiler supports non UCS character through non UCS encodings), the compiler had to use the private UCS ranges to map those physical characters, but anyway, that solution doesn't fit to the "universal-character-name that designates that character" part, because UCS code points inside private ranges doesn't define any specific character at all.

score 1 · Answer 1 · answered Aug 31 '17 at 22:20

1

~~Not really.~~. Kind of. The important part of the [lex.phases] quote IMO is as follows:

Physical source file characters are mapped, [...], to the basic source character set

Only the basic source character set is supported, everything else must be somehow mapped to it ([lex.charset]):

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ’

But the standard also says it should do this if necessary. It goes on to say the following:

The set of physical source file characters accepted is implementation-defined.

So I suppose that allows a compiler to do whatever it wants in the end so long as it at least supports the basic character set.

answered Aug 31 '17 at 22:20

AndyG

39,700
8
109
143

Anyway, impl-defined characters accepted *beyond* the basic characters, must be translated to UCS code points using the universal-character-name syntax. But what if certain characters has no corresponding UCS code point? – ABu Aug 31 '17 at 22:43
I would presume the compiler would fail to translate. Are you trying to code in a home brewed character set? – AndyG Sep 01 '17 at 00:07
Nou xD I'm just trying to understand the encodings / C++ Standard mess and their conceptual interactions. – ABu Sep 01 '17 at 00:19
@peregr There is no demand that the characters supported a subset of unicode as far as I can tell. As an example, a compiler could support input in a non-unicode windows code page. The resulting `"strings"` and even `L"strings"` do not have to be unicode, *unless* they are `u8`/`u16`/`u32` strings – Yakk - Adam Nevraumont Sep 01 '17 at 02:02

C++: implementation-defined accepted physical source file characters

1 Answers1