10

I've seen in a site that int main(int argc, char* argv<::>) can also be used as a signature of main. Surprisingly, The following program:

int main(int argc, char* argv<::>)
{
  return 0;
}

compiles withput any warnings in GCC , as well as clang. It also compiles in C++.

So, how is it that int main(int argc, char* argv<::>) is a valid signature of main?

Spikatrix
  • 20,225
  • 7
  • 37
  • 83

2 Answers2

13

char* argv<::> is equivalent to char* argv[]. <: and :> used here are digraphs.

C11: 6.4.6 (p3):

In all aspects of the language, the six tokens79)

<: :> <% %> %: %:%:

behave, respectively, the same as the six tokens

[ ] { } # ##

except for their spelling. 80)


Foot note:
79) These tokens are sometimes called ‘‘digraphs’’.
80) Thus [ and <: behave differently when ‘‘stringized’’ (see 6.10.3.2), but can otherwise be freely interchanged.

An example:

%: define  stringize(a) printf("Digraph \"%s\" retains its spelling in case of stringization.\n", %:a)    

Calling the above macro

stringize( %:);  

will print

Digraph "%:" retains its spelling in case of stringization.
haccks
  • 104,019
  • 25
  • 176
  • 264
8

<: and :> are digraphs; they are equivalent to [ and ], respectively.

I believe their only real-life use is to create obfuscated code such as the one you present, but they are part of the C99 standard, intended to replace the even more awkward trigraphs which have been in C since almost forever.

The original intent was to assist programmers working with national character sets which lacked certain punctuation marks. Since it is now fairly rare to encounter an environment which doesn't support (at least) eight-bit character sets, allowing characters like Ä to coexist with [, the issue is mostly moot. But backwards compatibility is still considered necessary.

rici
  • 234,347
  • 28
  • 237
  • 341
  • 4
    No. Digraphs and trigraphs were not designed to support obfuscation. No features exist for that reason. – David Heffernan Apr 26 '15 at 06:55
  • 3
    @DavidHeffernan: I didn't say they were intended to support obfuscation, only that they are used for that purpose. They were intended to support programmers (especially in Nordic countries) whose national character sets don't include certain punctuation marks, an issue which is no longer relevant. – rici Apr 26 '15 at 06:57
  • It might be more helpful to explain why they exist. If their only use is obfuscation (it isn't), then one can only conclude that's why they were added. – David Heffernan Apr 26 '15 at 06:58
  • @DavidHeffernan: Abbreviated history lesson added. – rici Apr 26 '15 at 07:02
  • 2
    There are still programmers who need and use such features. Nordic countries haven't yet sunk into the sea, for example. – Peter Apr 26 '15 at 07:08
  • In case anybody cares, these were added fairly specifically for computers and terminals that use the ISO 646 character set. – Jerry Coffin Apr 26 '15 at 07:10
  • @Peter: Nordic countries generally use iso-8859-1 these days (or even, gasp, Unicode), and have for some time. Undoubtedly you will still find a terminal somewhere which uses iso-646-no or iso-646-se; I no longer have one. (I myself stubbornly use a national keyboard which requires an AltGr to type []{}. But most programmers I know use English keyboards to program.) – rici Apr 26 '15 at 07:14
  • 1
    Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire character set of the language, input of special characters may be difficult, text editors may reserve some characters for special use and so on. Source: [Wikipedia](http://en.wikipedia.org/wiki/Digraphs_and_trigraphs#C). – haccks Apr 26 '15 at 07:45
  • Why are they called digraphs? – ChrisD Apr 26 '15 at 08:39
  • 1
    @ChrisD; **Digraph** is basically derived from English punctuation: digraph — a sound written with two letters, such as "ee" and "or." – haccks Apr 26 '15 at 08:58
  • 2
    @ChrisD: from the American Heritage Science Dictionary - "di- A prefix that means "two," "twice," or "double." It is used commonly in chemistry, as in dioxide, a compound having two oxygen atoms". So a digraph is a combination of two characters to represent another single character. Similarly, there are trigraphs in C which use three characters to represent another character. – Michael Burr Apr 26 '15 at 23:06
  • 1
    @Michael Burr Makes sense. When I hear "digraph" I think of a directed graph. Was curious about the origin in this context. Thanks for explanation. – ChrisD Apr 26 '15 at 23:48