52

What are the rules for the escape character \ in string literals? Is there a list of all the characters that are escaped?

In particular, when I use \ in a string literal in gedit, and follow it by any three numbers, it colors them differently.

I was trying to create a std::string constructed from a literal with the character 0 followed by the null character (\0), followed by the character 0. However, the syntax highlighting alerted me that maybe this would create something like the character 0 followed by the null character (\00, aka \0), which is to say, only two characters.

For the solution to just this one problem, is this the best way to do it:

std::string ("0\0" "0", 3)  // String concatenation 

And is there some reference for what the escape character does in string literals in general? What is '\a', for instance?

Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
David Stone
  • 26,872
  • 14
  • 68
  • 84
  • Related, on how to [escape an escape sequence](http://stackoverflow.com/questions/8229521/how-to-escape-or-terminate-an-escape-sequence-in-c). The best solution is to use concatenation as you had. – MPelletier Apr 19 '12 at 01:31
  • If you need a single `\` just use `\\`. – MPelletier Apr 19 '12 at 01:32
  • It looks like I can also use the initializer list syntax: `std::string { '0', 0, '0' };` – David Stone Apr 19 '12 at 04:49
  • 1
    Not only can I use the initializer list syntax, I now highly recommend it over any other method of constructing a string that requires you to specify a size or uses escaped characters. Consider the subtle undefined behavior outlined in http://stackoverflow.com/questions/164168/how-do-you-construct-a-stdstring-with-an-embedded-null/12884464#12884464 – David Stone Oct 14 '12 at 17:02
  • 2
    I realize now my comment at 1:32 is completely obfuscated... I have no idea what I meant... – MPelletier Oct 14 '12 at 22:14

6 Answers6

89

Control characters:

(Hex codes assume an ASCII-compatible character encoding.)

  • \a = \x07 = alert (bell)
  • \b = \x08 = backspace
  • \t = \x09 = horizonal tab
  • \n = \x0A = newline (or line feed)
  • \v = \x0B = vertical tab
  • \f = \x0C = form feed
  • \r = \x0D = carriage return
  • \e = \x1B = escape (non-standard GCC extension)

Punctuation characters:

  • \" = quotation mark (backslash not required for '"')
  • \' = apostrophe (backslash not required for "'")
  • \? = question mark (used to avoid trigraphs)
  • \\ = backslash

Numeric character references:

  • \ + up to 3 octal digits
  • \x + any number of hex digits
  • \u + 4 hex digits (Unicode BMP, new in C++11)
  • \U + 8 hex digits (Unicode astral planes, new in C++11)

\0 = \00 = \000 = octal ecape for null character

If you do want an actual digit character after a \0, then yes, I recommend string concatenation. Note that the whitespace between the parts of the literal is optional, so you can write "\0""0".

dan04
  • 87,747
  • 23
  • 163
  • 198
  • 18
    In the case of `\x`, the hex digits will be read 'greedily' until the first non-hex digit (that is, not limited to 2 as you might expect, and as some syntax highlighters *do* assume). You can use the @dan04 trick of splitting strings to mark the end of the hex: `"\x0020" "FeedDadBeer"` rather than `"\x0020FeedDadBeer"`. – Rhubbarb Sep 04 '12 at 10:24
  • So then what is represented by `\x` followed by an odd number of hexits? One assumes that for an even number, each hexit represents a nibble of memory from highest-to-lowest order—thus `\x5f` is `01011111` rather than `11110101`; but then does that mean `\x5` is `01010000` rather than `00000101`? And then what about `\x5f5`? Is that `01011111 01010000` or `01011111 00000101`? – eggyal Dec 06 '14 at 11:54
  • 1
    I don't know if this would validate a question of its own, but I've received string-data from some source with `"\e"` in it. I don't see it listed on any reference, could it be equivalient to `\x1B`? – Stijn Sanders Apr 20 '16 at 12:32
  • 3
    @StijnSanders: It's not in the C or C++ standard, but some compilers use `\e` to indicate the escape character `\x1B`. I have added it to my list. – dan04 Apr 20 '16 at 13:21
  • Could you give a reference about `\u` and `\U` usage? It works and I am interested in it while *C++ Primer 5th* doesn't say anything about them. I can only find one or two Q&As talking them on SO. – Rick Apr 23 '20 at 18:14
4

\a is the bell/alert character, which on some systems triggers a sound. \nnn, represents an arbitrary ASCII character in octal base. However, \0 is special in that it represents the null character no matter what.

To answer your original question, you could escape your '0' characters as well, as:

std::string ("\060\000\060", 3);

(since an ASCII '0' is 60 in octal)

The MSDN documentation has a pretty detailed article on this, as well cppreference

jli
  • 6,523
  • 2
  • 29
  • 37
  • That example uses the constructor string (const char * s), which treats s like a C string. OP's example uses string (const char * s, size_t n), which treats it like an array of characters. – mgiuffrida Apr 19 '12 at 01:47
4

\0 will be interpreted as an octal escape sequence if it is followed by other digits, so \00 will be interpreted as a single character. (\0 is technically an octal escape sequence as well, at least in C).

The way you're doing it:

std::string ("0\0" "0", 3)  // String concatenation 

works because this version of the constructor takes a char array; if you try to just pass "0\0" "0" as a const char*, it will treat it as a C string and only copy everything up until the null character.

Here is a list of escape sequences.

mgiuffrida
  • 3,299
  • 1
  • 26
  • 27
1

I left something like this as a comment, but I feel it probably needs more visibility as none of the answers mention this method:

The method I now prefer for initializing a std::string with non-printing characters in general (and embedded null characters in particular) is to use the C++11 feature of initializer lists.

std::string const str({'\0', '6', '\a', 'H', '\t'});

I am not required to perform error-prone manual counting of the number of characters that I am using, so that if later on I want to insert a '\013' in the middle somewhere, I can and all of my code will still work. It also completely sidesteps any issues of using the wrong escape sequence by accident.

The only downside is all of those extra ' and , characters.

David Stone
  • 26,872
  • 14
  • 68
  • 84
0

With the magic of user-defined literals, we have yet another solution to this. C++14 added a std::string literal operator.

using namespace std::string_literals;
auto const x = "\0" "0"s;

Constructs a string of length 2, with a '\0' character (null) followed by a '0' character (the digit zero). I am not sure if it is more or less clear than the initializer_list<char> constructor approach, but it at least gets rid of the ' and , characters.

Community
  • 1
  • 1
David Stone
  • 26,872
  • 14
  • 68
  • 84
0

ascii is a package on linux you could download. for example sudo apt-get install ascii ascii

Usage: ascii [-dxohv] [-t] [char-alias...]
-t = one-line output  -d = Decimal table  -o = octal table  -x = hex table
-h = This help screen -v = version information
Prints all aliases of an ASCII character. Args may be chars, C \-escapes,
English names, ^-escapes, ASCII mnemonics, or numerics in decimal/octal/hex.`

This code can help you with C/C++ escape codes like \x0A