18

The following question was given in a college programming contest. We were asked to guess the output and/or explain its working. Needless to say, none of us succeeded.

main(_){write(read(0,&_,1)&&main());}

Some short Googling led me to this exact question, asked in codegolf.stackexchange.com :

https://codegolf.stackexchange.com/a/1336/4085

There, its explained what it does : Reverse stdin and place on stdout, but not how.

I also found some help in this question : Three arguments to main, and other obfuscating tricks but it still does not explain how main(_), &_ and &&main() works.

My question is, how do these syntaxes work ? Are they something I should know about, as in, are they still relevant ?

I would be grateful for any pointers (to resource links, etc.), if not outright answers.

Community
  • 1
  • 1
RaunakS
  • 508
  • 1
  • 10
  • 20
  • That program won't compile in C++. Removing the C++ tag. – Robᵩ Apr 25 '12 at 18:03
  • @Robᵩ Ah thank you. I was careless. – RaunakS Apr 25 '12 at 18:04
  • 10
    Even in C, that program invokes undefined behavior multiple ways. The result is predictable only for specific compilers targeting specific types of CPUs (even on codegolf, this program only does something interesting at a specific optimization level). Correct answers to "What does this program do?" include "It depends," "Whatever it wants," and "It gets you fired." – Robᵩ Apr 25 '12 at 18:08
  • @Robᵩ Or, in my case, it gets me the exit door at a contest. Still, I would like to know *how* it works. Can I use a debugger, or any tool in an IDE (I'm using codeblocks) to get some idea ? – RaunakS Apr 25 '12 at 18:11
  • No, RaunakS, it gets a contest the exit door in your life. You *really* don't want to associate with people who think that this is a valid programming question. – Robᵩ Apr 25 '12 at 18:26
  • If you want to debug it, make sure you use a debugger which will let you step through individual machine instructions. Stepping through the source code won't help much at all. – Robᵩ Apr 25 '12 at 18:32
  • @Robᵩ Will gdb or valgrind allow me to step through machine instructions ? If not, what would you suggest ? – RaunakS Apr 25 '12 at 18:48
  • gdb will. I *think* useful commands are "display /i $pc", "x/i $pc", "nexti", and "stepi". – Robᵩ Apr 25 '12 at 18:50
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/10513/discussion-between-raunaks-and-rob) – RaunakS Apr 25 '12 at 18:52

2 Answers2

26

What does this program do?

main(_){write(read(0,&_,1)&&main());}

Before we analyze it, let's prettify it:

main(_) {
    write ( read(0, &_, 1) && main() );
}

First, you should know that _ is a valid variable name, albeit an ugly one. Let's change it:

main(argc) {
    write( read(0, &argc, 1) && main() );
}

Next, realize that the return type of a function, and the type of a parameter are optional in C (but not in C++):

int main(int argc) {
    write( read(0, &argc, 1) && main() );
}

Next, understand how return values work. For certain CPU types, the return value is always stored in the same registers (EAX on x86, for example). Thus, if you omit a return statement, the return value is likely going to be whatever the most recent function returned.

int main(int argc) {
    int result = write( read(0, &argc, 1) && main() );
    return result;
}

The call to read is more-or-less evident: it reads from standard in (file descriptor 0), into the memory located at &argc, for 1 byte. It returns 1 if the read was successful, and 0 otherwise.

&& is the logical "and" operator. It evaluates its right-hand-side if and only if it's left-hand-side is "true" (technically, any non-zero value). The result of the && expression is an int which is always 1 (for "true") or 0 (for false).

In this case, the right-hand-side invokes main with no arguments. Calling main with no arguments after declaring it with 1 argument is undefined behavior. Nevertheless, it often works, as long as you don't care about the initial value of the argc parameter.

The result of the && is then passed to write(). So, our code now looks like:

int main(int argc) {
    int read_result = read(0, &argc, 1) && main();
    int result = write(read_result);
    return result;
}

Hmm. A quick look at the man pages reveals that write takes three arguments, not one. Another case of undefined behavior. Just like calling main with too few arguments, we cannot predict what write will receive for its 2nd and 3rd arguments. On typical computers, they will get something, but we can't know for sure what. (On atypical computers, strange things can happen.) The author is relying upon write receiving whatever was previously stored on the memory stack. And, he is relying upon that being the 2nd and 3rd arguments to read.

int main(int argc) {
    int read_result = read(0, &argc, 1) && main();
    int result = write(read_result, &argc, 1);
    return result;
}

Fixing the invalid call to main, and adding headers, and expanding the && we have:

#include <unistd.h>
int main(int argc, int argv) {
    int result;
    result = read(0, &argc, 1);
    if(result) result = main(argc, argv);
    result = write(result, &argc, 1);
    return result;
}


Conclusions

This program won't work as expected on many computers. Even if you use the same computer as the original author, it might not work on a different operating system. Even if you use the same computer and same operating system, it won't work on many compilers. Even if you use the same computer compiler and operating system, it might not work if you change the compiler's command line flags.

As I said in the comments, the question does not have a valid answer. If you found a contest organizer or contest judge that says otherwise, don't invite them to your next contest.

Robᵩ
  • 163,533
  • 20
  • 239
  • 308
  • 1
    Oh wow, that was very, _very_ comprehensive. A clarification : the `write()` syntax is `int write(int fd, char *Buff, int NumBytes)`. So the return value of `read()` is becoming `1` for writing to standard output ? – RaunakS Apr 25 '12 at 18:29
  • 1
    0 is standard input, 1 is standard output, 2 is standard err. So, a successful return from read (combined with a successful return from the recursive call to main) yields a write to stdout. A failed return from read results in a write to stdin. Which is yet another undefined behavior. – Robᵩ Apr 25 '12 at 18:31
  • Ah yes, I should've wiki'd before asking. This code would be a very good IOCCC contender. And is undefined behavior such as this replicable ? I mean, on the same compiler (gcc 4.4.1), will this always give the same result ? – RaunakS Apr 25 '12 at 18:39
  • @RaunakS The same compiler on the same architecture with the same compilation options will probably always give the same result. Change one of the parameters and all bets are off. – Daniel Fischer Apr 25 '12 at 18:41
  • "Next, realize that the return type of a function, and the type of a parameter are optional in C" only for old enough values of C. As of C99, they're mandatory. – Daniel Fischer Apr 25 '12 at 18:42
  • Thanks, @DanielFischer. I haven't read the *new* standard yet. :) – Robᵩ Apr 25 '12 at 18:44
  • @DanielFischer I see what you mean. I just tried to compile in GNU GCC and my explorer crashed. And on Cygwin GCC, the terminal stopped responding. Codeblocks worked partially and only (http://ideone.com/) gave the expected output. I'd love to see where the contest organizers compiled their code. – RaunakS Apr 25 '12 at 18:51
  • 1
    In fact, in C99 and C++, an explicit return value for `main` can be omitted at the closing brace, in which case, the compiler must implicitly return 0 at that point. Making the assumption that it'll return something else invalid and UB - like so much else here! – underscore_d Apr 12 '16 at 20:57
  • @RaunakS no. The judges tend to frown upon system specific code. Ever since the 1984 contest. The winner mullender. – Pryftan May 03 '23 at 19:55
  • @RaunakS you’re also wrong about `write(2)`. The proper prototype is: `ssize_t write(int fd, const void *buf, size_t count);`. – Pryftan May 03 '23 at 19:58
9

Ok, _ is just a variable declared in early K&R C syntax with a default type of int. It functions as temporary storage.

The program will try to read one byte from standard input. If there is input, it will call main recursively continuing to read one byte.

At the end of input, read(2) will return 0, the expression will return 0, the write(2) system call will execute, and the call chain will probably unwind.

I say "probably" here because from this point on the results are highly implementation-dependent. The other parameters to write(2) are missing, but something will be in the registers and on the stack, so something will be passed into the kernel. The same undefined behavior applies to the return value from the various recursive activations of main.

On my x86_64 Mac, the program reads standard input until EOF and then exits, writing nothing at all.

DigitalRoss
  • 143,651
  • 25
  • 248
  • 329
  • Any citation on what is `_`? Curious to know about it – Pavan Manjunath Apr 25 '12 at 18:18
  • It's just a formal parameter *("variable")* name. It's equivalent to `main(int _)` ... imagine that they called it *"argc"* and it will all be clear. That is: `main(argc)` would be early C with default **int,** the *prototype* declarations were added later. They don't declare the usual *argv* but nothing drastic will happen as a result. – DigitalRoss Apr 25 '12 at 18:28
  • Yeah, a plain `_` is a legal variable name. – John Bode Apr 25 '12 at 18:37