6

I found this regarding how the C preprocessor should handle string literal concatenation (phase 6). However, I can not find anything regarding how this is handled in C++ (does C++ use the C preprocessor?).

The reason I ask is that I have the following:

const char * Foo::encoding = "\0" "1234567890\0abcdefg";

where encoding is a static member of class Foo. Without the availability of concatenation I wouldnt be able to write that sequence of characters like that.

const char * Foo::encoding = "\01234567890\0abcdefg";

Is something entirely different due to the way \012 is interpreted.

I dont have access to multiple platforms and I'm curious how confident I should be that the above is always handled correctly - i.e. I will always get { 0, '1', '2', '3', ... }

Community
  • 1
  • 1
ezpz
  • 11,767
  • 6
  • 38
  • 39
  • 1
    Just Out of curiosity - why are you using char* instead of std::string? – Robben_Ford_Fan_boy May 14 '10 at 20:42
  • 1
    @David Relihan: Why would anyone use a `std::string` for an immutable string constant? Not even taking into account that a `char *` might be required by the client code specification (like some API). – AnT stands with Russia May 14 '10 at 20:48
  • 1
    @AndreyT: "Why would anyone use a std::string for an immutable string constant?" Because they wanted to lexicographically compare strings (and don't want to take into account whether they are constants or whatnot)? – sbi May 14 '10 at 20:56
  • 7
    You could write it as `"\000123..."`. A numeric escape sequence may have at most three octal digits. The fourth digit is not included as part of the escape sequence; it's an ordinary character. – Rob Kennedy May 14 '10 at 21:02
  • @Rob: I hadn't thought of that. Nice solution :) – ezpz May 14 '10 at 21:05

3 Answers3

10

The language (C as well as C++) has no "preprocessor". "Preprocessor", as a separate functional unit, is an implementation detail. The way the source file(s) is handled if defined by so called phases of translation. One of the phases in C, as well as in C++ involves concatenating string literals.

In C++ language standard it is described in 2.1. For C++ (C++03) it is phase 6

6 Adjacent ordinary string literal tokens are concatenated. Adjacent wide string literal tokens are concatenated.

sbi
  • 219,715
  • 46
  • 258
  • 445
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Right, I was looking for a document detailing this for C++. I was not able to find one - though I found C details readily. – ezpz May 14 '10 at 20:54
  • 1
    AndreyT - you forgot to mention that `"\0"` is converted to the target character set _before_ string literals are merged. This is the key to the question at hand. – D.Shawley May 14 '10 at 21:18
  • 1
    @D.Shawley: I don't immediately understand the importance of that. You mean without that the `\0` part could still merge with `12` part and form an octal char literal `\012`? Hm... I'd say that the important part here is actually phase 4, not 5, when each string literal is converted into an independent *preprocessing token*. This alone already takes care of the potential issue with `\012`, doesn't it? – AnT stands with Russia May 14 '10 at 21:40
6

Yes, it will be handled as you describe, because it is in stage 5 that,

Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set (C99 §5.1.1.2/1)

The language in C++03 is effectively the same:

Each source character set member, escape sequence, or universal-character-name in character literals and string literals is converted to a member of the execution character set (C++03 §2.1/5)

So, escape sequences (like \0) are converted into members of the execution character set in stage five, before string literals are concatenated in stage six.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • Right - I get that much. My question is whether this is transparent across C/C++. And, if so, where I can reference that documentation. – ezpz May 14 '10 at 20:55
  • @ezpz: Sorry; I missed that you were interested in the compatibility between the two. Yes, the results are the same for both C and C++; I've added the language from the C++ standard, which effectively says the same thing. You can find where to get the relevant standards documents from this question: http://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents – James McNellis May 14 '10 at 20:58
  • 1
    +1 for actually mentioning the different stages of translation since this is why `"\0" "12"` is not the same as `"\012"`. – D.Shawley May 14 '10 at 21:17
0

Because of the agreement between the C++ and C standards. Most, if not all, C++ implementations use a C preprocessor, so yes, C++ uses the C preprocessor.

octopusgrabbus
  • 10,555
  • 15
  • 68
  • 131
Christopher Barber
  • 2,548
  • 1
  • 22
  • 23
  • More precisely, the C++ standard and C standard agree on certain translation phases, and in preprocessor directives, and every C++ implementation I know of uses a C preprocessor. I like to keep the difference between what the Standards say and what implementations do. – David Thornley May 14 '10 at 21:06