C++ Preprocessor string literal concatenation

Question

I found this regarding how the C preprocessor should handle string literal concatenation (phase 6). However, I can not find anything regarding how this is handled in C++ (does C++ use the C preprocessor?).

The reason I ask is that I have the following:

const char * Foo::encoding = "\0" "1234567890\0abcdefg";

where encoding is a static member of class Foo. Without the availability of concatenation I wouldnt be able to write that sequence of characters like that.

const char * Foo::encoding = "\01234567890\0abcdefg";

Is something entirely different due to the way \012 is interpreted.

I dont have access to multiple platforms and I'm curious how confident I should be that the above is always handled correctly - i.e. I will always get { 0, '1', '2', '3', ... }

Just Out of curiosity - why are you using char* instead of std::string? — Robben_Ford_Fan_boy, May 14 '10 at 20:42
@David Relihan: Why would anyone use a `std::string` for an immutable string constant? Not even taking into account that a `char *` might be required by the client code specification (like some API). — AnT stands with Russia, May 14 '10 at 20:48
@AndreyT: "Why would anyone use a std::string for an immutable string constant?" Because they wanted to lexicographically compare strings (and don't want to take into account whether they are constants or whatnot)? — sbi, May 14 '10 at 20:56
You could write it as `"\000123..."`. A numeric escape sequence may have at most three octal digits. The fourth digit is not included as part of the escape sequence; it's an ordinary character. — Rob Kennedy, May 14 '10 at 21:02

score 10 · Accepted Answer · edited May 17 '10 at 21:07

10

The language (C as well as C++) has no "preprocessor". "Preprocessor", as a separate functional unit, is an implementation detail. The way the source file(s) is handled if defined by so called phases of translation. One of the phases in C, as well as in C++ involves concatenating string literals.

In C++ language standard it is described in 2.1. For C++ (C++03) it is phase 6

6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.

edited May 17 '10 at 21:07

sbi

219,715
46
258
445

answered May 14 '10 at 20:45

AnT stands with Russia

312,472
42
525
765

Right, I was looking for a document detailing this for C++. I was not able to find one - though I found C details readily. – ezpz May 14 '10 at 20:54
1

AndreyT - you forgot to mention that `"\0"` is converted to the target character set _before_ string literals are merged. This is the key to the question at hand. – D.Shawley May 14 '10 at 21:18
1

@D.Shawley: I don't immediately understand the importance of that. You mean without that the `\0` part could still merge with `12` part and form an octal char literal `\012`? Hm... I'd say that the important part here is actually phase 4, not 5, when each string literal is converted into an independent *preprocessing token*. This alone already takes care of the potential issue with `\012`, doesn't it? – AnT stands with Russia May 14 '10 at 21:40

James McNellis · Answer 2 · 2010-05-14T20:56:08.390

6

Yes, it will be handled as you describe, because it is in stage 5 that,

Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set (C99 §5.1.1.2/1)

The language in C++03 is effectively the same:

Each source character set member, escape sequence, or universal-character-name in character literals and string literals is converted to a member of the execution character set (C++03 §2.1/5)

So, escape sequences (like \0) are converted into members of the execution character set in stage five, before string literals are concatenated in stage six.

edited May 14 '10 at 20:56

answered May 14 '10 at 20:46

James McNellis

348,265
75
913
977

Right - I get that much. My question is whether this is transparent across C/C++. And, if so, where I can reference that documentation. – ezpz May 14 '10 at 20:55
@ezpz: Sorry; I missed that you were interested in the compatibility between the two. Yes, the results are the same for both C and C++; I've added the language from the C++ standard, which effectively says the same thing. You can find where to get the relevant standards documents from this question: http://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents – James McNellis May 14 '10 at 20:58
1

+1 for actually mentioning the different stages of translation since this is why `"\0" "12"` is not the same as `"\012"`. – D.Shawley May 14 '10 at 21:17

score 0 · Answer 3 · edited Jul 13 '12 at 20:37

0

Because of the agreement between the C++ and C standards. Most, if not all, C++ implementations use a C preprocessor, so yes, C++ uses the C preprocessor.

edited Jul 13 '12 at 20:37

octopusgrabbus

10,555
15
68
131

answered May 14 '10 at 20:43

Christopher Barber

2,548
1
22
23

More precisely, the C++ standard and C standard agree on certain translation phases, and in preprocessor directives, and every C++ implementation I know of uses a C preprocessor. I like to keep the difference between what the Standards say and what implementations do. – David Thornley May 14 '10 at 21:06

C++ Preprocessor string literal concatenation

3 Answers3

Linked