42

Given that there were once reasons to use digraphs and trigraphs in C and C++, does anyone put them in code being written today? Is there any substantial amount of legacy code still under maintenance that contains them?

(Note: Here, "digraph" does not mean "directed graph." Both digraph and trigraph have multiple meanings, but the intended use here are sequences like ??= or <: to stand in for characters like # and [)

BЈовић
  • 62,405
  • 41
  • 173
  • 273
rwallace
  • 31,405
  • 40
  • 123
  • 242
  • 1
    I've never once seen one (on purpose!), but I work in games which tends to be much much less in legacy code. – Michael Dorgan Sep 16 '11 at 23:44
  • 4
    Have some fun with Google Code Search! For example: http://www.google.com/codesearch#search/&q=%5C?%5C?%5C(%20lang:%5Ec$&type=cs will look for instances of `??(` – Ray Toal Sep 16 '11 at 23:48
  • Don't forget quotation marks! @Ray - Thank you. I will now spend an hour looking up cuss words and laughing at the bad code that comes with. – Anne Quinn Sep 17 '11 at 00:40
  • 3
    @Ray - thanks, interesting! Clearly the vast majority of occurrences are in string literals and comments where `??(x)` is pseudocode for a function call. The search is narrowed down by [looking for](http://www.google.com/codesearch#search/&q=\?\?\%3C%20lang:^c$&type=cs) `??<` instead, which standing for `{` is essential in any C source. — there is not a single genuine example of a trigraph in all 14 pages of results. Mostly they are HTML pseudocode, with some compilers/compiler tests and base64 encoded text thrown in. (I'm interested because I'm writing a preprocessor for C++11 practice.) – Potatoswatter Sep 17 '11 at 01:48
  • Thankfully, compilers have options to disable their expansion! – Matthieu M. Sep 17 '11 at 10:28
  • 2
    @Matthieu: But if you use such an option, your code becomes dependent on it, and either fails to compile or has a different meaning when compiled without the option. I'd rather have a warning so I can avoid trigraphs altogether. – Keith Thompson Sep 20 '11 at 14:59
  • @Keith: I guess it depends whether you value portability or not. I'd prefer compiling without them, and *in case of porting* use the warning to patch. – Matthieu M. Sep 20 '11 at 18:02

5 Answers5

26

I don't know for sure, but you're most likely to find digraphs and trigraphs being used in IBM mainframe environments. The EBCDIC character set doesn't include some characters that are required for C.

The other justification for digraphs and trigraphs, 7-bit ASCII-ish character sets that replace some punctuation characters with accented letters, is probably less relevant today.

Outside such environments, I suspect that trigraphs are more commonly used by mistake than deliberately, as in:

puts("What happened??!");

For reference, trigraphs were introduced in the 1989 ANSI C standard (which essentially became the 1990 ISO C standard). They are:

??= #     ??) ]     ??! |
??( [     ??' ^     ??> }
??/ \     ??< {     ??- ~

The replacements occur anywhere in source code, including comments and string literals.

Digraphs are alternate spellings of certain tokens, and do not affect comments or literals:

<: [      :>   ]
<% {      %>   }
%: #      %:%: ##

Digraphs were introduced by the 1995 amendment to the 1990 ISO C standard.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • Those 7-bit ASCII-ish character sets were standardized as ISO-646 in 1972, and they were already falling out of use in the 1980's, to be replaced by 8-bit ISO-8859 variants (including Windows-1252) by the 1990's. The latter include all 7-bit ASCII characters and do not require trigraphs in C code. If there are legacy ISO-646 systems still around, they are so long obsolete that no one is going to be writing new C code for them. – han Sep 17 '11 at 06:14
  • 2
    And in that case, write `puts("What happened?" "?!\n");` to get the right output. – Gzorg Dec 06 '12 at 18:52
  • 2
    @Gzorg The trigraphs may also be circumvented by escaping the second '?' thus: `puts("What happened?\?!\n");` – Rhubbarb Jun 18 '13 at 08:42
  • 1
    Actually one of secondary reasons for trigraphs (after EBCDIC) was that numerous mini-computers of the 1970s and 80s came with terminal keyboards very much unlike the standardization of PC / Apple keyboards today. Each vendor had their own keyboard layout, sometime with variations in different lines. On some terminals it wasn't easy, or directly possibly to enter some symbols, for example the tilde, '~' or even the "(commercial) at" symbol '@', hence the need. I don't think C was even implemented non-ASCII systems other IBM's. – mctylr Aug 26 '14 at 20:23
  • 1
    @mctylr Many microcomputers of the late '70s and early '90s also did not follow the IBM Selectric or PC standards, and didn't have all the requisite characters to avoid using _n_-graphs. For example, Atari's variation of ASCII used on their 8-bit machines, ATASCII, didn't include the curly braces or the vertical bar. To add to the fun, I used a C compiler on that machine that used non-standard bigraphs--`(*` and `*)` for the open and close curly braces. – dodgethesteamroller Sep 01 '15 at 01:08
17

There is a proposal pending for C++1z (the next standard after C++1y will be standardized into -hopefully- C++14) that aims to remove trigraphs from the Standard. They did a case study on an otherwise undisclosed large codebase:

Case study

The uses of trigraph-like constructs in one large codebase were examined. We discovered:

923 instances of an escaped ? in a string literal to avoid trigraph replacement: string pattern() const { return "foo-????\?-of-?????"; }

4 instances of trigraphs being used deliberately in test code: two in the test suite for a compiler, the other two in a test suite for boost's preprocessor library.

0 instances of trigraphs being deliberately used in production code. Trigraphs continue to pose a burden on users of C++.

The proposal notes (bold emphasis from the original proposal):

If trigraphs are removed from the language entirely, an implementation that wishes to support them can continue to do so: its implementation-defined mapping from physical source file characters to the basic source character set can include trigraph translation (and can even avoid doing so within raw string literals). We do not need trigraphs in the standard for backwards compatibility.

TemplateRex
  • 69,038
  • 19
  • 164
  • 304
  • 4
    But then what happens to all the escaped trigraphs??! – Praxeolitic Sep 30 '14 at 05:22
  • 1
    @Praxeolitic I see what you did there. Know that four years later your joke is appreciated. As for the actual question, `\?` continues to represent literal `?` characters, because it's one of the defined escape sequences, regardless of whether or not trigraphs exist. – mtraceur May 24 '18 at 21:57
  • I do agree that some of the weird looking tri/digraph should be removed, but there are useful trigraph/digraph such as `and` , `or`, `not`, `bitand`, `not_eq` which looked like very explicit and hence enforces code intent. – daparic Jul 23 '18 at 21:33
9

They can be used for The International Obfuscated C Code Contest.

Simon
  • 3,224
  • 3
  • 23
  • 17
  • 1
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – ChrisH Jun 24 '14 at 16:15
  • @ChrisH: IMO the link is not the answer (i.e. this is not a link-only answer). The link is just added for convenience. – undur_gongor Jul 16 '14 at 13:10
  • 3
    @undur_gongor Fair enough. But I guess given that its an answer to an almost-3-year-old question that's been accepted with a 17+ vote answer, it's rather academic. – ChrisH Jul 16 '14 at 16:31
5

The use of tri and di-graphs isn't written in this day, it exists only in very old code that was created in a very limited environment. Any code that contains trigraphs, if you attempt to compile them on a modern compiler like VS's,it will usually not compile unless you specify a linker option. I know for Visual Studio, that option is "/Zc:trigraphs"

Why they exist, is because the C++ committee never issues changes that would 'break' legacy code. For better or for worse. There is an anecdote that their removal was proposed and supported, and it was stopped by a lone IBM representative.

Anne Quinn
  • 12,609
  • 8
  • 54
  • 101
  • EBCDIC is still used on old IBM mainframes, and does not include all the required characters for writing C/C++ :( – Matthieu M. Sep 17 '11 at 10:28
  • 2
    Why would it be a *linker* option? Trigraphs are handled by the compiler; the linker doesn't even need to be aware of them. – Keith Thompson Oct 20 '14 at 18:48
3

I know this is an old question, but there is arguably a legitimate use these days: touch screens without an actual keyboard. For example, the typical US keyboard layout isn't necessarily available in full form if you do any coding via tablet or something like that, which admittedly is hopefully rare due to how cumbersome it can be (three clicks on mine for an assignment operator). I personally don't use them if possible, but they are useful in absence of the actual tokens they're meant to represent.

Again, I really hope people avoid this where possible, but it is one reason to know and use them.