46

Today I found strange syntax like

 int _$[:>=<%-!.0,};

in some old code, but in fact the code is not commented. There seems to be no report of compile errors for this line. I tested it separately and it can compile too:

int main(){
    int _$[:>=<%-!.0,};
    return 0;
}

Why can it compile?

Nayuki
  • 17,911
  • 6
  • 53
  • 80
ggrr
  • 7,737
  • 5
  • 31
  • 53
  • 7
    [What is this smiley-with-beard expression](http://stackoverflow.com/q/15736282/995714) http://stackoverflow.com/q/27678297/995714 http://stackoverflow.com/q/27601706/995714 – phuclv Aug 14 '15 at 06:45
  • 2
    This isn't C (because no C token can contain a dollar sign). Any compiler accepting this code translates some **other** language than C. – Jens Aug 14 '15 at 08:17
  • 1
    @Jens don't *all* compilers translate to **some other** language than C?.. Pedants aside, if you see my answer, [gnu `gcc` actually supports `$` in identifiers](https://gcc.gnu.org/onlinedocs/gcc/Dollar-Signs.html). And, as far as I can tell, so does `llvm`. I think you're confusing *machines* with compilers as some *machines* don't support $ in identifiers. – dcow Aug 14 '15 at 10:20
  • 7
    @dcow C compilers accept C as the **source** language. I'm not confusing anything. Apparantly you think that gcc is a C compiler. It is not. It accepts a language called GNU C in which `$` is acceptable in identifiers. In Standard C this is a syntax error that **must** be diagnosed. To turn gcc into a C compiler, you need to provide a set of esoteric options like `-ansi -pedantic -Wno-trigraph` or so and even then it might accept some non-C programs. C is defined by ISO9899, not by the language accepted by some random compiler. – Jens Aug 14 '15 at 11:16
  • 3
    @dcow Jens said "translates some other language", not "translates **to** some other language". A C compiler translates C (to assembly or whatever), a Java compiler translates Java (to bytecode), an assembler translates assembly (to machine code). – user253751 Aug 14 '15 at 13:16
  • 4
    Is there a purpose to this line of code other than to confuse the reader and the compiler? Who would ever jam these random symbols together unless this is from some obfuscated C contest? – JPhi1618 Aug 14 '15 at 13:43
  • 1
    Anytime C and weird symbols are involved check if there are #define for any of those weird symbols – Mystra007 Aug 14 '15 at 14:33
  • 7
    Digraphs get asked about literally every day on stackoverflow. I don't understand why so many copies of the same question make it to the hot questions lists... or why they aren't closed as duplicates. – BlueRaja - Danny Pflughoeft Aug 14 '15 at 16:27
  • @Jens sorry the distinction I was trying to make clear is not whether GNU C is ANSI C, but rather than it doesn't matter which C the compiler translates, the symbols just have to be consumable by either the underlying runtime library or, if that does no translation, the assembler. – dcow Aug 14 '15 at 17:24
  • 8
    How to get a lot of upvotes on SO: 1. Write some funky C code with digraphs. 2. Post question about it 3. PROFIT!!! – Geier Aug 14 '15 at 19:37
  • 3
    I can think of several people I've known who would write code like this. Note: I specifically *did not say* "friends" or "respected colleages"... – Bob Jarvis - Слава Україні Aug 15 '15 at 00:35

4 Answers4

49

With Digraph (see below), the line is converted to:

int _$[]={-!.0,};

On the right hand side, .0 is the double literal, ! is the logical negation operator, - is the arithmetic negation operator, and , is the trailing comma. Together {-!.0,} is an array initializer.

The left hand side int _$[] defines an int array. However, there's one last problem, _$ is not a valid identifier in standard C. Some compilers (e.g, gcc) supports it as extension.


C11 §6.4.6 Punctuators

In all aspects of the language, the six tokens

<: :> <% %> %: %:%:

behave, respectively, the same as the six tokens

[  ]  {  }  #  ##
Community
  • 1
  • 1
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • What options did you give gcc? With gcc-4.8 with `-std=c11 -pedantic -Wall -Wextra` doesn't even give a warning. If I replace -std=c11 with -std=c89, then it warns on the $ and rejects the digraph. – rici Aug 14 '15 at 05:37
  • @rici You are right, gcc compiles it fine, I'll edit the post. When I try it, I mistakenly thought there was no `int` (as many obfuscated code would do), so I use `-std=c89` to make sure implicit int was enabled, and gcc gives error on undeclared `_$`. – Yu Hao Aug 14 '15 at 05:53
  • Identifiers containing `$` are apparently common in VMS. – user1686 Aug 14 '15 at 08:44
  • Could you also throw in a `~` (bitwise negation) with the other unary operators? – Nick T Aug 15 '15 at 01:43
39

Well,

  • underscore _ is an allowed identifier character,
  • dollar sign $ is allowed in some implementations too,
  • left bracket [ denotes the type should be array,
  • :> is the digraph for ],
  • equals = is assignment,
  • <% is the digraph for {,
  • -!.0 is just -1 (.0 is a double literal 0.0, ! implicitly casts to (int) 0 and logically inverts it, and - is negative),
  • you can have trailing commas in array initializers {1, (2, 3,)},
  • and ; ends the statement.,

So you get

int _$[] = {-1,};
dcow
  • 7,765
  • 3
  • 45
  • 65
12

If we replace the digraphs :> and <% present in your line of code, we end up with

int _$[]={-!.0,};

which is equivalent to

int _$[] = { -1, };

It is a declaration of array _$ of type int [1] with an initializer.

Note that this is not exactly guaranteed to compile since standard C language does not immediately provide support for $ character in indentifiers. It allows implementations to extend the set of supported charaters though. Apparently the compiler you used supported $ in identifiers.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
11

This works due to digraphs in C. The line in question decodes like this:

int _$ [ :> = <% - ! .0  , } ;
int _$ [ ]  = {  - ! 0.0 , } ;

Furthermore:

  • .0 is a double literal.
  • ! is the Boolean negation operator, so !.0 yields (int) 1.
  • - is the unary negation operator, which yields (int) -1.
  • A trailing comma is legal after an array element.
Nayuki
  • 17,911
  • 6
  • 53
  • 80