29

I'm learning about raw strings in C++ from a cplusplus.com tutorial on constants. Based on the definition on that site, a raw string should start with R"sequence( and end with )sequence where sequence can be any sequence of characters.

One of the examples of the website is the following:

R"&%$(string with \backslash)&%$"

However, when I try to compile the code that contains the above raw string, I get a compilation error.

test.cpp:5:28: error: invalid character '$' in raw string delimiter
    5 |     std::string str = R"&%$(string with \backslash)&%$";
      |                       ^
test.cpp:5:23: error: stray 'R' in program

I tried it with g++ and clang++ on both Windows and Linux. None of them worked.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Amirreza A.
  • 736
  • 4
  • 10
  • 10
    Quoting [cppreference](https://en.cppreference.com/w/cpp/language/string_literal): "_A character sequence made of any **source character** but parentheses, backslash and spaces_" (emphasis mine). But the [_basic source character set_](https://en.cppreference.com/w/cpp/language/translation_phases) does not contain the `$` character, that's why. – heap underrun Feb 27 '21 at 17:09
  • 1
    @drescherjm Maybe because `$` is not used in any of the language features? As I see it, the basic set is required to be able to represent keywords, numbers, operators, and other built-in C++ language features. On the other hand, `$`, `@`, e.t.c. are just "for fun". ;) – heap underrun Feb 27 '21 at 17:38
  • 14
    "One of the examples of the website" cplusplus.com strikes again. It's best to pretend that this site doesn't exist. – Nicol Bolas Feb 27 '21 at 17:42
  • I misunderstood where `$` could be not used. I see now it makes sense. – drescherjm Feb 27 '21 at 17:46
  • 9
    @NicolBolas - if only Google didn't _nearly always_ put cplusplus.com above cppreference.com in its results ... – davidbak Feb 28 '21 at 05:30

3 Answers3

25

From C++ reference:

delimiter: A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)

Note the "any source character" part here.

Let us look at what the standard says:

From [gram.lex]:

raw-string:
  "d-char-sequenceopt(r-char-sequenceopt)d-char-sequenceopt"

...

d-char-sequence:
  d-char
  d-char-sequence d-char

d-char:
  any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline.

Well, what is the basic source character set? From [lex.charset]:

The basic source character set consists of 96 characters: the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters:

a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & |~! = , \ " ’

... which does not include $; so the conclusion is that the dollar sign $ cannot be part of the delimiter sequence.

ph3rin
  • 4,426
  • 1
  • 18
  • 42
  • @AntoninGAVREL Maybe back in the days there were keyboards layouts/encodings that don't have the `$` sign? I am not so sure. – ph3rin Feb 27 '21 at 17:36
  • 3
    @AntoninGAVREL: What is there to fix? Just don't try to use it here. There's no particular reason to. – Nicol Bolas Feb 27 '21 at 17:41
  • 3
    @drescherjm It doesn't exclude `$` from the body of the literal, just the delimiter. Almost always, you can find a pretty good delimiter out of (more than) the `88 ^ 16` combinations. – ph3rin Feb 27 '21 at 17:46
  • 15
    So in short: use cppreference.com, not cplusplus.com. The latter tends to have many more inaccuracies like that, and the wasted time piles up. – spectras Feb 27 '21 at 18:24
4

For the basic source character set, see lex.charset 5.3 (1): that set does not contain the $ character. For the allowed prefix characters in raw string literals, see lex.string 5.13.5: "/…/ any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline." (emphasis mine).

heap underrun
  • 1,846
  • 1
  • 18
  • 22
1

Just remove $ like the code below:

string string3 = R"&%(string with \backslash)&%";

$ gives an error because the basic source character set does not have $ as said in the comments.

  1. The individual bytes of the source code file are mapped (in implementation-defined manner) to the characters of the basic source character set. In particular, OS-dependent end-of-line indicators are replaced by newline characters. The basic source character set consists of 96 characters:

a) 5 whitespace characters (space, horizontal tab, vertical tab, form feed, new-line)

b) 10 digit characters from '0' to '9'

c) 52 letters from 'a' to 'z' and from 'A' to 'Z'

d) 29 punctuation characters: _ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " ' 2) Any source file character that cannot be mapped to a character in the basic source character set is replaced by its universal character name (escaped with \u or \U) or by some implementation-defined form that is handled equivalently.

Reference

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Rohith V
  • 1,089
  • 1
  • 9
  • 23