54

In C and C++ (and several other languages) horizontal tabulators (ASCII code 9) in character and string constants are denoted in escaped form as '\t' and "\t". However, I am regularly typing the unescaped tabulator character in string literals as for example in "A B" (there is a TAB in betreen A and B), and at least clang++ does not seem to bother - the string seems to be equivalent to "A\tB". I like the unescaped version better since long indented multi-line strings are better readable in the source code.

Now I am asking myself whether this is generally legal in C and C++ or just supported by my compiler. How portable are unescaped tabulators in character and string constants?

Surprisingly I could not find an answer to this seemingly simple question, neither with Google nor on stackoverflow (I just found this vaguely related question).

Community
  • 1
  • 1
tglas
  • 949
  • 10
  • 19
  • 10
    Did you check your editor settings? Sometimes a tab is automatically converted to some number of white spaces – 463035818_is_not_an_ai Mar 06 '15 at 14:31
  • I am gonna throw wild guess and say that the predprocessor takes care of the tabs. – Evdzhan Mustafa Mar 06 '15 at 14:34
  • 5
    Every time you encounter with something similar two questions arise: "can you?" and "should you?". And usually the latter one is the important one... – Karoly Horvath Mar 06 '15 at 14:41
  • 1
    @KarolyHorvath: I think both are equally important, at least in programming. You always need "legal" knowledge and "moral" knowledge in order to be truly proficient in any programming language. – Christian Hackl Mar 06 '15 at 14:58
  • 1
    @EvdzhanMustafa: no -- then you could use `\t` anywhere in your source. This syntax is *specific* for strings and character literals *only*. You may be thinking of [trigraphs](http://stackoverflow.com/questions/7451406/are-digraphs-and-trigraphs-in-use-today), which are sort-of inbetween source and preprocessing. – Jongware Mar 06 '15 at 15:10
  • 1
    @Jongware: The preprocessor (pedantically, translation phase 5) does take care of tabs within literals when it converts source characters and escape sequences into execution characters. – Mike Seymour Mar 06 '15 at 15:14
  • 5
    @tobi303 I wrote the editor myself, so yes I am sure that a tab is a tab is a tab :) – tglas Mar 06 '15 at 15:22
  • 1
    @tglas but your coworker may use an other editor and when he'll edit the file, the editor *might* automatically replace all those tabs with spaces and break your code... or he may edit a string and try to add a tab like you but he'll insert spaces instead. – Bakuriu Mar 06 '15 at 19:07
  • @Bakuriu (and also @ some others providing answers below): I see your point. It is surely good practice to escape tabs in shared code. – tglas Mar 06 '15 at 19:11
  • Assume first question! – chux - Reinstate Monica Mar 06 '15 at 20:07
  • 1
    @Bakuriu Any editor that replaces characters within string literals should be unceremoniously thrown away. You are thinking of whitespace formatting at the beginning of lines. – user207421 Mar 07 '15 at 02:20

4 Answers4

55

Yes, you can include a tab character in a string or character literal, at least according to C++11. The allowed characters include (with my emphasis):

any member of the source character set except the double-quote ", backslash \, or new-line character

(from C++11 standard, annex A.2)

and the source character set includes:

the space character, the control characters representing horizontal tab, vertical tab, form feed, and new-line, plus the following 91 graphical characters

(from C++11 standard, paragraph 2.3.1)

UPDATE: I've just noticed that you're asking about two different languages. For C99, the answer is also yes. The wording is different, but basically says the same thing:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or [...]

where both the source and execution character sets include

control characters representing horizontal tab, vertical tab, and form feed.

Martin J.
  • 5,028
  • 4
  • 24
  • 41
Mike Seymour
  • 249,747
  • 28
  • 448
  • 644
27

It's completely legal to put a tab character directly into a character string or character literal. The C and C++ standards require the source character set to include a tab character, and string and character literals may contain any character in the source character set except backslash, quote or apostrophe (as appropriate) and newline.

So it's portable. But it is not a good idea, since there is no way a reader can distinguish between different kinds of whitespace. It is also quite common for text editors, mail programs, and the like to reformat tabs, so bugs may be introduced into the program in the course of such operations.

rici
  • 234,347
  • 28
  • 237
  • 341
8

If you enter a tab into an input, then your string will contain a literal tab character, and it will stay a tab character - it wont' be magically translated into \t internally.

Same goes for writing code - you can embed literal tab characters in your strings. However, consider this:

     T     T     T        <--tab stops
012345012345012345012345
foo1 = 'a\tb';
foo2 = 'a  b'; // pressed tab in the editor
foo3 = 'a  b'; // hit space twice in the editor

Unless you put the cursor on the whitespace between a and b and checked how many characters are in there, there is essentially NO way to determine if there's a tab or actual space characters in there. But with the \t version, it is immediately shown to be a tab.

Marc B
  • 356,200
  • 43
  • 426
  • 500
  • 2
    "it wont' be magically translated into \t internally" -- do you happen to know when `\t` gets translated into a tab, then? (And whether it's actually a useful fact to know.) – Jongware Mar 06 '15 at 14:41
  • 1
    at compile time. the `\t` exists only in the textual version of your code. once the compiler's done, the binary/object file will contain ascii char 9 (one byte), not `\t` (two bytes). – Marc B Mar 06 '15 at 14:42
  • 1
    Those are the two very end points of the process I think it's safe to assume it's part of standard *string* processing. Which reminds me: the same must be true for the single character constant `'\t'`. (Ah wait: that's in the specification as well, as Mike shows.) – Jongware Mar 06 '15 at 14:46
  • if your editor doesn't distinguish hard tabs and spaces, that seems like an indictment of your editor, not of tabs – Steve Cox Mar 06 '15 at 15:07
  • 2
    My editor shows whitespace, space and tab are quite distinct there – ratchet freak Mar 06 '15 at 15:35
  • I've set my editor to show all non-printable characters (not just tabs) as colored boxes. – jamesqf Mar 06 '15 at 17:54
  • 3
    @Jongware: It happens in translation phase 5, which is to say after preprocessing and before concatenation of adjacent string literals. ("5. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character.") Whether that's useful to know is a function of the knower :) – rici Mar 06 '15 at 18:58
2

When you press the TAB key you get whatever code point your system maps that key to. That code point may or may not be a tab on the system where the program runs. When you put \t in a literal the compiler replaces it with the appropriate code point for the target system. So if you want to be sure that you get a tab on the system where the program runs, use \t. That's its job.

Pete Becker
  • 74,985
  • 8
  • 76
  • 165
  • 1
    Translation phase 5 says "each source character set member [...] is converted to the corresponding member of the execution character set". Section 5.2.1 lists the 95 characters (including tab) that must appear in both the source and the execution sets. The definition of character constant says that the mapping from source to execution is implementation-defined; however the word "corresponding" in translation phase 5 is surely meant to imply that those 95 characters map to themselves? If not, then the DS9000 faced with the canonical Hello, World program might output `Potty, Cyst!?` – M.M Mar 07 '15 at 04:17