390

Why does this code give the output C++Sucks? What is the concept behind it?

#include <stdio.h>

double m[] = {7709179928849219.0, 771};

int main() {
    m[1]--?m[0]*=2,main():printf((char*)m);    
}

Test it here.

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
codeslayer1
  • 3,666
  • 3
  • 17
  • 13
  • 1
    @BoBTFish technically, yes, but it runs all the same in C99: http://ideone.com/IZOkql – nikolas Aug 01 '13 at 11:26
  • @nijansen My point being that calling `main()` recursively is legal in c, but not in c++. – BoBTFish Aug 01 '13 at 11:27
  • 1
    @Sid I read this code as a status on someone's topcoder profile..tested it and was surprised by the output..tried to analyse it and reached a bit far but couldn't figure out the exact explanation so asked it here – codeslayer1 Aug 01 '13 at 18:43
  • 1
    I think you will also like to read this Q&A: [Obfuscated C Code Contest 2006](http://stackoverflow.com/questions/15393441/obfuscated-c-code-contest-2006-please-explain-sykes2-c/15395030#15395030) – Grijesh Chauhan Aug 01 '13 at 20:54
  • 1
    @detonator123 oh, you have no idea how stackoverflow mechanics work? Then I'd be much obliged, here you go: [http://stackoverflow.com/help/why-vote](http://stackoverflow.com/help/why-vote). – nurettin Aug 02 '13 at 09:57
  • 12
    @nurettin I had similar thoughts. But it's not OP's fault, it's the people voting for this useless knowledge. Admitted, this code obfuscation stuff may be interesting but type "obfuscation" in Google and you get tons of results in every formal language you can think of. Don't get me wrong, I find it OK to ask such a question here. It's just an overrated because not very useful question though. – TobiMcNamobi Aug 02 '13 at 10:44
  • 6
    @detonator123 "You must be new here" - if you look at the closure reason, you can find out that it is not the case. The required minimal understanding is clearly missing from your question - "I don't understand this, explain it" is not something that is welcome on Stack Overflow. Should you have attempted something **yourself** first, would the question have not been closed. It's trivial to google "double representation C" or the like. –  Aug 02 '13 at 20:06
  • @BenCollins since when is void main() c++? – josefx Aug 05 '13 at 17:29
  • @josefx: since it's not an allowed signature for a `main()` function in [tag:C]. http://stackoverflow.com/questions/2108192/what-are-the-valid-signatures-for-cs-main-function. Eh, while we're at it, it's not allowed in [tag:C++] either. – Ben Collins Aug 05 '13 at 18:21
  • 43
    My big-endian PowerPC machine prints out `skcuS++C`. – Adam Rosenfield Aug 08 '13 at 03:13
  • 29
    My word, I hate contrived questions like this. It's a bit pattern in memory that happens to be the same as some silly string. It serves no useful purpose to anyone, and yet it earns hundreds of rep points for both the questioner and the answerer. Meanwhile, difficult questions that could be useful to people earn maybe a handful of points, if any. This is kind of a poster child of what's wrong with SO. – Carey Gregory Apr 24 '14 at 03:56

9 Answers9

501

The number 7709179928849219.0 has the following binary representation as a 64-bit double:

01000011 00111011 01100011 01110101 01010011 00101011 00101011 01000011
+^^^^^^^ ^^^^---- -------- -------- -------- -------- -------- --------

+ shows the position of the sign; ^ of the exponent, and - of the mantissa (i.e. the value without the exponent).

Since the representation uses binary exponent and mantissa, doubling the number increments the exponent by one. Your program does it precisely 771 times, so the exponent which started at 1075 (decimal representation of 10000110011) becomes 1075 + 771 = 1846 at the end; binary representation of 1846 is 11100110110. The resultant pattern looks like this:

01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011
-------- -------- -------- -------- -------- -------- -------- --------
0x73 's' 0x6B 'k' 0x63 'c' 0x75 'u' 0x53 'S' 0x2B '+' 0x2B '+' 0x43 'C'

This pattern corresponds to the string that you see printed, only backwards. At the same time, the second element of the array becomes zero, providing null terminator, making the string suitable for passing to printf().

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • 22
    Why is the string backwards? – Derek Aug 01 '13 at 13:26
  • 97
    @Derek x86 is little-endian – Angew is no longer proud of SO Aug 01 '13 at 13:36
  • 17
    @Derek This is because of the platform-specific [endianness](http://en.wikipedia.org/wiki/Endianness#Endianness_and_hardware): the bytes of the abstract IEEE 754 representation are stored in memory at decreasing addresses, so the string prints correctly. On hardware with big endianness one would need to start with a different number. – Sergey Kalinichenko Aug 01 '13 at 13:38
  • 4
    Does the C++ standard require floating points being represented in IEEE 754? Wikipedia says no... – Alvin Wong Aug 01 '13 at 15:04
  • 14
    @AlvinWong You are correct, the standard does not require IEEE 754 or any other specific format. This program is about as non-portable as it gets, or very close to it :-) – Sergey Kalinichenko Aug 01 '13 at 15:14
  • 3
    @dasblinkenlight how did you generate binary? just read bit pattern? – Grijesh Chauhan Aug 01 '13 at 20:56
  • 10
    @GrijeshChauhan I used a [double-precision IEEE754 calculator](http://www.binaryconvert.com/convert_double.html): I pasted the `7709179928849219` value, and got the binary representation back. – Sergey Kalinichenko Aug 01 '13 at 20:59
  • 3
    Didn't know if I should give you +1 for figuring it out, or -1 for thinking it was worth the time. – Edward Falk Aug 02 '13 at 01:26
  • 2
    @EdwardFalk Writing it in the first place took time; figuring it out took less than five minutes, with all the calculators available to me online. – Sergey Kalinichenko Aug 02 '13 at 01:31
  • Recursion makes it print from back to forward direction. :) – Hemant_Negi Aug 02 '13 at 10:39
  • 7
    @JimBalter What is so incredible about people with low levels of skill? People have to start somewhere. You weren't born a C++ guru either. – us2012 Aug 02 '13 at 13:56
  • 2
    @us2012, Jim is is absolutely correct about the "fast lane" learning style. Understanding how data is represented and handled on hardware is one of the first things taught in a decent course, or discovered by a genuinely interested and curious student. The number of self proclaimed "IT Gurus" that flaunt their use of some high-level framework or language as being "knowledgeable" or "expert" is disgusting and a disservice to the giant's shoulders they are standing on. – Preet Kukreti Aug 03 '13 at 04:30
  • @ant I pasted the number into IEEE 754 calculator, looked up the place where mantissa goes, and did a little arithmetic with my calculator. Everything else just falls in place. – Sergey Kalinichenko Oct 13 '15 at 11:55
226

More readable version:

double m[2] = {7709179928849219.0, 771};
// m[0] = 7709179928849219.0;
// m[1] = 771;    

int main()
{
    if (m[1]-- != 0)
    {
        m[0] *= 2;
        main();
    }
    else
    {
        printf((char*) m);
    }
}

It recursively calls main() 771 times.

In the beginning, m[0] = 7709179928849219.0, which stands for C++Suc;C. In every call, m[0] gets doubled, to "repair" last two letters. In the last call, m[0] contains ASCII char representation of C++Sucks and m[1] contains only zeros, so it has a null terminator for C++Sucks string. All under assumption that m[0] is stored on 8 bytes, so each char takes 1 byte.

Without recursion and illegal main() calling it will look like this:

double m[] = {7709179928849219.0, 0};
for (int i = 0; i < 771; i++)
{
    m[0] *= 2;
}
printf((char*) m);
Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
107

Disclaimer: This answer was posted to the original form of the question, which mentioned only C++ and included a C++ header. The question's conversion to pure C was done by the community, without input from the original asker.


Formally speaking, it's impossible to reason about this program because it's ill-formed (i.e. it's not legal C++). It violates C++11[basic.start.main]p3:

The function main shall not be used within a program.

This aside, it relies on the fact that on a typical consumer computer, a double is 8 bytes long, and uses a certain well-known internal representation. The initial values of the array are computed so that when the "algorithm" is performed, the final value of the first double will be such that the internal representation (8 bytes) will be the ASCII codes of the 8 characters C++Sucks. The second element in the array is then 0.0, whose first byte is 0 in the internal representation, making this a valid C-style string. This is then sent to output using printf().

Running this on HW where some of the above doesn't hold would result in garbage text (or perhaps even an access out of bounds) instead.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
  • 25
    I have to add that this is not an invention of C++11 - C++03 also had `basic.start.main` 3.6.1/3 with the same wording. – sharptooth Aug 01 '13 at 11:30
  • 1
    The point of this small example is to illustrate what can be done with C++. Magic sample using UB tricks or huge software packages of "classic" code. – SChepurin Aug 01 '13 at 11:30
  • 1
    @sharptooth Thanks for adding this. I didn't mean to imply otherwise, I just cited the standard I used. – Angew is no longer proud of SO Aug 01 '13 at 11:38
  • @Angew: Yeap, I understand that, just wanted to say that the wording is pretty old. – sharptooth Aug 01 '13 at 11:41
  • Yes, it violates C++11, but what if it was written a long, long time before C++11? – vsz Aug 02 '13 at 08:33
  • "Formally speaking, it's impossible to reason about this program" -- Only if the C++ standard is the basis of formal logic ... but it isn't so your statement is false. It is quite possible to reason about this program, using various well-known facts. One can easily make *conditional* statements about the program, such as "if the machine it runs on has such and such common characteristics, then the program will do such and such". In fact, we know what the program did when run, so we can formally reason from there to characteristics of the implementation. – Jim Balter Aug 02 '13 at 11:49
  • "The function main shall not be used within a program." -- You fail to note, grasp, or appreciate the *context* of this statement. The "shall" applies to *strictly conforming programs* ... a program that violates such a constraint is not a strictly conforming C++ program. But most programs aren't. – Jim Balter Aug 02 '13 at 11:52
  • 1
    @JimBalter Notice I said "formally speaking, it's impossible to reason," *not* "it's impossible to formally reason." You're right that it is possible to reason about the program, but you need to know the details of the compiler used to do that. It would be *fully within a compiler's rights* to simply eliminate the call to `main()`, or replace it with an API call to format the harddrive, or whatever. – Angew is no longer proud of SO Aug 02 '13 at 12:12
  • @vsz As sharptooth pointed out, the same wording was present in C++03 as well. I don't have access to a C++98 standard, but it wouldn't surprise me to find it in there as well. – Angew is no longer proud of SO Aug 02 '13 at 12:19
  • What C++ allows is irrelevant, as this is a C program, not C++. Granted, that alone does not rescue this program from formally undefined behaviour. – Bernd Jendrissek Feb 16 '18 at 00:41
  • @BerndJendrissek When I answered the question, it was still tagged with C++ only, and mentioned only C++ in the title, included ``, and so on. It's not my fault others edited it to mention only C, without the OP's input on the matter. Check the question's edit history, not to mention the message being "C++Sucks" – Angew is no longer proud of SO Feb 16 '18 at 09:14
  • @Angew Oh wow, that's horrible editing (and one reason I'm not a fan of non-OP editing). Sorry for falling for it and yelling at you in xkcd.com/386 style – Bernd Jendrissek Feb 16 '18 at 23:40
  • @BerndJendrissek No problem, at least I was notified of the change in the Q and added a suitable disclaimer. – Angew is no longer proud of SO Feb 17 '18 at 21:22
57

Perhaps the easiest way to understand the code is to work through things in reverse. We'll start with a string to print out -- for balance, we'll use "C++Rocks". Crucial point: just like the original, it's exactly eight characters long. Since we're going to do (roughly) like the original, and print it out in reverse order, we'll start by putting it in in reverse order. For our first step, we'll just view that bit pattern as a double, and print out the result:

#include <stdio.h>

char string[] = "skcoR++C";

int main(){
    printf("%f\n", *(double*)string);
}

This produces 3823728713643449.5. So, we want to manipulate that in some way that isn't obvious, but is easy to reverse. I'll semi-arbitrarily choose multiplication by 256, which gives us 978874550692723072. Now, we just need to write some obfuscated code to divide by 256, then print out the individual bytes of that in reverse order:

#include <stdio.h>

double x [] = { 978874550692723072, 8 };
char *y = (char *)x;

int main(int argc, char **argv){
    if (x[1]) {
        x[0] /= 2;  
        main(--x[1], (char **)++y);
    }
    putchar(*--y);
}

Now we have lots of casting, passing arguments to (recursive) main that are completely ignored (but evaluation to get the increment and decrement are utterly crucial), and of course that completely arbitrary looking number to cover up the fact that what we're doing is really pretty straightforward.

Of course, since the whole point is obfuscation, if we feel like it we can take more steps as well. Just for example, we can take advantage of short-circuit evaluation, to turn our if statement into a single expression, so the body of main looks like this:

x[1] && (x[0] /= 2,  main(--x[1], (char **)++y));
putchar(*--y);

To anybody who isn't accustomed to obfuscated code (and/or code golf) this starts to look pretty strange indeed -- computing and discarding the logical and of some meaningless floating point number and the return value from main, which isn't even returning a value. Worse, without realizing (and thinking about) how short-circuit evaluation works, it may not even be immediately obvious how it avoids infinite recursion.

Our next step would probably be to separate printing each character from finding that character. We can do that pretty easily by generating the right character as the return value from main, and printing out what main returns:

x[1] && (x[0] /= 2,  putchar(main(--x[1], (char **)++y)));
return *--y;

At least to me, that seems obfuscated enough, so I'll leave it at that.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
24

It is just building up a double array (16 bytes) which - if interpreted as a char array - build up the ASCII codes for the string "C++Sucks"

However, the code is not working on each system, it relies on some of the following undefined facts:

Nilay Vishwakarma
  • 3,105
  • 1
  • 27
  • 48
D.R.
  • 20,268
  • 21
  • 102
  • 205
12

The following code prints C++Suc;C, so the whole multiplication is only for the last two letters

double m[] = {7709179928849219.0, 0};
printf("%s\n", (char *)m);
Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Serve Laurijssen
  • 9,266
  • 5
  • 45
  • 98
11

The others have explained the question pretty thoroughly, I'd like to add a note that this is undefined behavior according to the standard.

C++11 3.6.1/3 Main function

The function main shall not be used within a program. The linkage (3.5) of main is implementation-defined. A program that defines main as deleted or that declares main to be inline, static, or constexpr is ill-formed. The name main is not otherwise reserved. [ Example: member functions, classes, and enumerations can be called main, as can entities in other namespaces. —end example ]

Yu Hao
  • 119,891
  • 44
  • 235
  • 294
9

The code could be re-written like this:

void f()
{
    if (m[1]-- != 0)
    {
        m[0] *= 2;
        f();
    } else {
          printf((char*)m);
    }
}

What it's doing is producing a set of bytes in the double array m that happen to correspond to the characters 'C++Sucks' followed by a null-terminator. They've obfuscated the code by choosing a double value which when doubled 771 times produces, in the standard representation, that set of bytes with the null terminator provided by the second member of the array.

Note that this code wouldn't work under a different endian representation. Also, calling main() is not strictly allowed.

Jack Aidley
  • 19,439
  • 7
  • 43
  • 70
1

First we should recall that double precision numbers are stored in the memory in binary format as follows:

(i) 1 bit for the sign

(ii) 11 bits for the exponent

(iii) 52 bits for the magnitude

The order of the bits decrease from (i) to (iii).

First the decimal fractional number is converted to equivalent fractional binary number and then it is expressed as order of magnitude form in binary.

So the number 7709179928849219.0 becomes

(11011011000110111010101010011001010110010101101000011)base 2


=1.1011011000110111010101010011001010110010101101000011 * 2^52

Now while considering the magnitude bits 1. is neglected as all the order of magnitude method shall start with 1.

So the magnitude part becomes :

1011011000110111010101010011001010110010101101000011 

Now the power of 2 is 52 , we need to add biasing number to it as 2^(bits for exponent -1)-1 i.e. 2^(11 -1)-1 =1023 , so our exponent becomes 52 + 1023 = 1075

Now our code mutiplies the number with 2, 771 times which makes the exponent to increase by 771

So our exponent is (1075+771)= 1846 whose binary equivalent is (11100110110)

Now our number is positive so our sign bit is 0.

So our modified number becomes :

sign bit + exponent+ magnitude (simple concatenation of the bits)

0111001101101011011000110111010101010011001010110010101101000011 

since m is converted to char pointer we shall split the bit pattern in chunks of 8 from the LSD

01110011 01101011 01100011 01110101 01010011 00101011 00101011 01000011 

(whose Hex equivalent is :)

 0x73 0x6B 0x63 0x75 0x53 0x2B 0x2B 0x43 

ASCII CHART Which from the character map as shown is :

s   k   c   u      S      +   +   C 

Now once this has been made m[1] is 0 which means a NULL character

Now assuming that you run this program on a little-endian machine( lower order bit is stored in lower address) so pointer m pointer to the lowest address bit and then proceeds by taking up bits in chucks of 8 ( as type casted to char* ) and the printf() stops when encounted 00000000 in the last chunck...

This code is however not portable.

Abhishek Ghosh
  • 597
  • 7
  • 18