296

Given by a colleague as a puzzle, I cannot figure out how this C program actually compiles and runs. What is this >>>= operator and the strange 1P1 literal? I have tested in Clang and GCC. There are no warnings and the output is "???"

#include <stdio.h>

int main()
{
    int a[2]={ 10, 1 };

    while( a[ 0xFULL?'\0':-1:>>>=a<:!!0X.1P1 ] )
        printf("?");

    return 0;
}
TLama
  • 75,147
  • 17
  • 214
  • 392
CustomCalc
  • 2,182
  • 3
  • 12
  • 9

3 Answers3

471

The line:

while( a[ 0xFULL?'\0':-1:>>>=a<:!!0X.1P1 ] )

contains the digraphs :> and <:, which translate to ] and [ respectively, so it's equivalent to:

while( a[ 0xFULL?'\0':-1 ] >>= a[ !!0X.1P1 ] )

The literal 0xFULL is the same as 0xF (which is hex for 15); the ULL just specifies that it's an unsigned long long literal. In any case, as a boolean it's true, so 0xFULL ? '\0' : -1 evaluates to '\0', which is a character literal whose numerical value is simply 0.

Meanwhile, 0X.1P1 is a hexadecimal floating point literal equal to 2/16 = 0.125. In any case, being non-zero, it's also true as a boolean, so negating it twice with !! again produces 1. Thus, the whole thing simplifies down to:

while( a[0] >>= a[1] )

The operator >>= is a compound assignment that bit-shifts its left operand right by the number of bits given by the right operand, and returns the result. In this case, the right operand a[1] always has the value 1, so it's equivalent to:

while( a[0] >>= 1 )

or, equivalently:

while( a[0] /= 2 )

The initial value of a[0] is 10. After shifting right once, it become 5, then (rounding down) 2, then 1 and finally 0, at which point the loop ends. Thus, the loop body gets executed three times.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • 19
    Could you please elaborate on the `P` in `0X.1P1`. – Kijewski Aug 25 '14 at 22:20
  • 78
    @Kay: It's the same as `e` in `10e5`, except you have to use `p` for hexadecimal literals because `e` is a hexadecimal digit. – Dietrich Epp Aug 25 '14 at 22:21
  • 9
    @Kay: The hex float literals are part of C99, but [GCC also accepts them in C++ code](https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html). As Dietrich notes, the `p` separates the mantissa and the exponent, just like the `e` in normal scientific float notation; one difference is that, with hex floats, the base of the exponential part is 2 instead of 10, so `0x0.1p1` equals 0x0.1 = 1/16 times 2¹ = 2. (In any case, none of that matters here; any non-zero value would work equally well there.) – Ilmari Karonen Aug 25 '14 at 22:27
  • Minor: Certain `'\0'` is an `int` literal. Try `printf("%zu %zu\n", sizeof ('\0'), sizeof (char));` – chux - Reinstate Monica Aug 25 '14 at 22:46
  • 6
    @chux: Apparently, that [depends](http://stackoverflow.com/questions/433895/why-are-c-character-literals-ints-instead-of-chars) on whether the code is compiled as C or (as it was originally tagged) C++. But I fixed the text to say "character literal" instead of "`char` literal", and added a Wikipedia link. Thanks! – Ilmari Karonen Aug 25 '14 at 23:24
  • Hmmm... I have tried `if (str.length() && str[str.length() :> != L'\\' && str<:str.length()] != L'/')` but it gives compilation error, is something wrong with this piece of code? – ST3 Aug 28 '14 at 08:26
  • Just to state: refer to section 6.4.6 from the C99 standard for the punctuator digraphs's semantics (i.e. `<:`). http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf – h7r Aug 31 '14 at 09:04
69

It is some rather obscure code involving digraphs, namely <: and :> which are alternative tokens for [ and ] respectively. There is also some use of the conditional operator. There is also a bit shifting operator, the right shift assignment >>=.

This is a more readable version:

while( a[ 0xFULL ? '\0' : -1 ] >>= a[ !!0X.1P1 ] )

and an even more readable version, replacing the expressions in the [] for the values they resolve to:

while( a[0] >>= a[1] )

Replacing a[0] and a[1] for their values should make it easy to figure out what the loop is doing, i.e. the equivalent of:

int i = 10;
while( i >>= 1)

which is simply performing (integer) division by 2 in each iteration, producing the sequence 5, 2, 1.

juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • I didn't run it - would this not produce `????`, though, rather than `???` as the OP got? (Huh.) http://codepad.org/nDkxGUNi *does* produce `???`. – Jongware Aug 25 '14 at 22:34
  • 7
    @Jongware the 10 got divided on first iteration. So the values being evaluated by the loop are 5, 2, 1, and 0. So it only prints out 3 times. – MysticXG Aug 25 '14 at 22:49
42

Let's go through the expression left-to-right:

a[ 0xFULL?'\0':-1:>>>=a<:!!0X.1P1 ]

The first thing I notice is that we are using the ternary operator from the use of ?. So the subexpression:

0xFULL ? '\0' : -1

is saying "if 0xFULL is non-zero, return '\0', otherwise -1. 0xFULL is a hexadecimal literal with the unsigned long-long suffix - meaning it's a hexadecimal literal of type unsigned long long. That doesn't really matter though, because 0xF can fit inside a regular integer.

Also, the ternary operator converts the types of the second and third terms to their common type. '\0' is then converted to int, which is just 0.

The value of 0xF is way bigger than zero, so it passes. The expression now becomes:

a[ 0 :>>>=a<:!!0X.1P1 ]

Next, :> is a digraph. It is a construct that expands to ]:

a[0 ]>>=a<:!!0X.1P1 ]

>>= is the signed right shift operator, we can space that out from a to make it clearer.

Moreover, <: is a digraph that expands to [:

a[0] >>= a[!!0X.1P1 ]

0X.1P1 is a hexadecimal literal with an exponent. But no matter the value, the !! of anything that's non-zero is true. 0X.1P1 is 0.125 which is non-zero, so it becomes:

a[0] >>= a[true]
-> a[0] >>= a[1]

The >>= is the signed right shift operator. It changes the value of its left operand by shifting its bits forward by the value on the operator's right side. 10 in binary is 1010. So here are the steps:

01010 >> 1 == 00101
00101 >> 1 == 00010
00010 >> 1 == 00001
00001 >> 1 == 00000

>>= returns the result of its operation, so as long as shifting a[0] remains non-zero for every time its bits are shifted right by one, the loop will continue. The fourth attempt is where a[0] becomes 0, so the loop is never entered.

As a result, ? is printed three times.

David G
  • 94,763
  • 41
  • 167
  • 253
  • 3
    `:>` is a *digraph*, not a trigraph. It's not handled by the preprocessor, it's simply recognized as a token equivalent to `]`. – Keith Thompson Aug 25 '14 at 22:33
  • 1
    The ternary operator (`?:`) has a type that is the common type of the second and third terms. The first term is always a conditional and has a type `bool`. Since both second and third terms have type `int` the result of the ternary operation will be `int`, not `unsigned long long`. – Corey Aug 26 '14 at 06:25
  • 2
    @KeithThompson it could be handled by the preprocessor. The preprocessor has to know about digraphs because `#` and `##` have digraph forms; there's nothing stopping an implementation from translating digraphs to non-digraphs during the early translation phases – M.M Aug 26 '14 at 13:15
  • @MattMcNabb It's been a long time since I had to know this, but IIRC as a consequence of other requirements, digraphs have to stay in their digraph form until the point where pp-tokens are converted to tokens (right at the beginning of translation phase 7). – zwol Aug 26 '14 at 13:43
  • @Corey: The first operand of `?:` has scalar type; it needn't have type `bool` (which was only introduced to the language in 1999; `?:` is much older), and it's not converted to `bool`. Incidentally, `0xFULL` is of type `unsigned long long`, not `long long`. – Keith Thompson Aug 26 '14 at 14:04
  • @MattMcNabb: Yes, the preprocessor has to know about digraphs, both because of `%:` and `%:%:` and because `#if` expressions can use them, but there's no reason for it to convert them. The standard describes digraphs as tokens that behave the same as what they're equivalent to. – Keith Thompson Aug 26 '14 at 14:05