46

There are four special non-alphabet characters that need to be escaped in C/C++: the single quote \', the double quote \", the backslash \\, and the question mark \?. It's apparently because they have special meanings. ' for single char, " for string literals, \ for escape sequences, but why is ? one of them?

I read the table of escape sequences in a textbook today and I realized that I've never escaped ? before and have never encountered a problem with it. Just to be sure, I tested it under GCC:

#include <stdio.h>
int main(void)
{
    printf("question mark ? and escaped \?\n");
    return 0;
}

And the C++ version:

#include <iostream>
int main(void)
{
    std::cout << "question mark ? and escaped \?" << std::endl;
    return 0;
}

Both programs output: question mark ? and escaped ?

So I have two questions:

  1. Why is \? one of the escape sequence characters?
  2. Why does non-escaping ? work fine? There's not even a warning.

The more interesting fact is that the escaped \? can be used the same as ? in some other languages as well. I tested in Lua/Ruby, and it's also true even though I didn't find this documented.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Yu Hao
  • 119,891
  • 44
  • 235
  • 294

1 Answers1

48

Why is \? one of the escape sequence characters?

Because it is special. The answer leads to Trigraph, and the C/C++ preprocessor replaces the following three-character sequences with the corresponding single character. (C11 §5.2.1.1 and C++11 §2.3)

Trigraph:       ??(  ??)  ??<  ??>  ??=  ??/  ??'  ??!  ??-
Replacement:      [    ]    {    }    #    \    ^    |    ~

A trigraph is nearly useless now, and it is mainly used for obfuscation purposes. Some examples can be seen in IOCCC.

GCC doesn't support trigraph by default and will warn you if there's a trigraph in the code, unless the option -trigraphs3 is enabled. Under the -trigraphs option, the second \? is useful in the following example:

printf("\?\?!\n");

Output would be | if ? is not escaped.

For more information on trigraphs, see Cryptic line "??!??!" in legacy code


Why does non-escaping ? work fine. There's not even a warning.

Because ?(and double quote ") can be represented by themselves by the standard:

C11 §6.4.4.4 Character constants Section 4

The double-quote " and question-mark ? are representable either by themselves or by the escape sequences \" and \?, respectively, but the single-quote ' and the backslash \ shall be represented, respectively, by the escape sequences \' and \\.

Similar in C++:

C++11 §2.13.2 Character literals Section 3

Certain nongraphic characters, the single quote , the double quote ", the question mark ?, and the backslash \, can be represented according to Table 6. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote and the backslash \ shall be represented by the escape sequences \’ and \\ respectively. If the character following a backslash is not one of those specified, the behavior is undefined. An escape sequence specifies a single character.

Community
  • 1
  • 1
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • 3
    The line: `For example, gcc doesn't support trigraph by default, unless the option -trigraphs is enabled. Under such options, \? is useful in some cases:` is **misleading**. It seems to say that even if you don't use the `-trigraphs` option, gcc would interpret `??!` as a `|`. – devnull Oct 15 '13 at 06:56
  • 1
    @devnull: Many people disable gnu language variants with things like `-std=c++98` or `-std=c99` so trigraph support being enabled is quite common. – CB Bailey Oct 15 '13 at 07:19
  • 3
    How can the double quote `"` represented by itself? Without escaping how can one type double quote in the middle of the string? – phuclv Oct 15 '13 at 07:53
  • 2
    @LưuVĩnhPhúc The standard isn't clear on that. I think `"` can't be used inside string literal, but you can use it in single quotes as `'"'` or `'\"'`, but `'''` is invalid, you must use `'\''`. – Yu Hao Oct 15 '13 at 07:58
  • 1
    @LưuVĩnhPhúc The standard is clear: A **character constant** can be an escape sequence or "any member of the source character set except the single-quote ', backslash \, or new-line character". A character of a **string literal** can be an escape sequence or "any member of the source character set except the double-quote ", backslash \, or new-line character". – chux - Reinstate Monica Mar 13 '15 at 19:04
  • 4
    I don't think trigraphs are just "nearly useless now". By my understanding, efforts to find any use of trigraphs in production code failed to find *any* deliberate use outside of compiler test suites, demonstrations of how trigraphs work, etc. That sounds like a more accurate (maybe less diplomatic) statement would be "a feature that was never really useful and should never have been in the language in the first place". – supercat Jun 10 '15 at 13:14
  • Thanks for pointing the standards, took me a few hours to figure out why a std::string "\'" (backslash followed by single quote) returned a single quote and not the backslash to ... I just needed to escape both std::string "\\\'" – Radu Maris Feb 08 '16 at 14:59
  • Late update: worth to mention that C++17 finally removed trigraphs entirely (see C.4.1 5.2; apart from, they aren't mentioned before anywhere any more, afterwards, only references to removal). – Aconcagua Aug 09 '18 at 08:45
  • 1
    @devnull Don't understand your comment. The statement is absolutely clear. It says that trigraphs are not supported unless `-trigraphs` is used. What is misleading about that? The next sentence can't "undo" what the previous sentence has already defined. – Mecki Mar 23 '20 at 00:35