98

Consider following program:

#include <iostream>
int main = ( std::cout << "C++ is excellent!\n", 195 ); 

Using g++ 4.8.1 (mingw64) on Windows 7 OS, the program compiles and runs fine, printing:

C++ is excellent!

to the console. main appears to be a global variable rather than a function; how can this program execute without the function main()? Does this code conform to the C++ standard? Is the behavior of the program is well defined? I have also used the -pedantic-errors option but the program still compiles and runs.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
Destructor
  • 14,123
  • 11
  • 61
  • 126
  • 1
    MSVC++14.0 complaines - LNK1561: entry point must be defined –  Sep 29 '15 at 18:26
  • Are you using the `strict` compiler setting? What compiler settings are you using? – Thomas Matthews Sep 29 '15 at 18:27
  • @Fireho: MSVS 2010 also shows linker error. But why g++ accept it? – Destructor Sep 29 '15 at 18:27
  • 11
    @πάνταῥεῖ: why language lawyer tag is necessary? – Destructor Sep 29 '15 at 18:29
  • 1
    it compiles OK yet core dumps g++ 4.8.4 – Scott Stensland Sep 29 '15 at 18:31
  • 1
    Works for me with g++ 4.8.3 on RHEL 7. – Fred Larson Sep 29 '15 at 18:33
  • 15
    Note that `195` is the opcode for the `RET` instruction, and that in the C calling convention, the caller clears the stack. – Brian Bi Sep 29 '15 at 18:37
  • 2
    @PravasiMeet "then how this program executes" – do you not think the initialization code for a variable should be execute (even without the `main()` function? in fact, they are completely unrelated.) – The Paramagnetic Croissant Sep 29 '15 at 18:37
  • 1
    Typically there's a piece of code in the runtime that runs static constructors before calling `main` (although the standard technically allows static initialization to not occur until after `main` starts). – Brian Bi Sep 29 '15 at 18:41
  • 4
    I'm among those who found that the program segfaults as is (64-bit linux, g++ 5.1/clang 3.6). I can rectify this however by amending it to `int main = ( std::cout << "C++ is excellent!\n", exit(0),1 );` (and including ``), albeit the program remains legally ill-formed. – Mike Kinghan Sep 29 '15 at 18:58
  • 12
    @Brian You should mention architecture when making statements like that. All the world is not a VAX. Or x86. Or whatever. – dmckee --- ex-moderator kitten Sep 29 '15 at 21:58
  • @Brian there's no "C calling convention", as C doesn't care lower implementations. Calling convention depends on architecture and ABI. Even in x86 there are several conventions for caller or callee clean up. And 195 may be RET in x86 but not on other architectures – phuclv Sep 30 '15 at 05:59
  • @LưuVĩnhPhúc : It's a bit unrealistic to think that an ABI can be uniform across languages. Some languages need varargs, other need classes. "C calling convention" is a practical reality, and means that the ABI has support for varargs (which imply caller cleanup). – MSalters Sep 30 '15 at 09:12
  • It may be helpful to clarify with an edit to your question that although the validity of this program is important your focus is more on the mechanics of what is going on. This will help to distinguish it from the more narrow question of *is this valid*. – Shafik Yaghmour Sep 30 '15 at 19:07
  • I changed the title because it is a better description of the actual question, although I think it could be improved further. Perhaps someone else can find a better wording, I always have a hard time coming up with good title. – Shafik Yaghmour Oct 01 '15 at 17:48
  • @ShafikYaghmour: No. you shouldn't change the title. The title I gave was good. – Destructor Oct 01 '15 at 17:49
  • 2
    Saying tricky code is not descriptive of what the question is asking and such generic titles are discouraged. Usually with hot network questions someone comes up with a better title by now, so I usually just wait for someone else to do it. – Shafik Yaghmour Oct 01 '15 at 17:49
  • @ShafikYaghmour: But don't you think my question shows research efforts? – Destructor Oct 01 '15 at 17:50
  • 1
    I think it was a good question, if I did not think so: I would not have answered it and possibly I would have downvoted and/or voted to close. – Shafik Yaghmour Oct 01 '15 at 17:56
  • 2
    See this [meta post](http://meta.stackoverflow.com/a/254226/1708801): *The question's title (and the body, too) should describe the actual problem, regardless of how the original poster viewed, framed, or described it.* – Shafik Yaghmour Oct 01 '15 at 18:01
  • Related question: http://stackoverflow.com/q/2252380/252489 – Gowtham Oct 04 '15 at 16:14

7 Answers7

88

Before going into the meat of the question about what is going on, it is important to point out that program is ill-formed as per defect report 1886: Language linkage for main():

[...] A program that declares a variable main at global scope or that declares the name main with C language linkage (in any namespace) is ill-formed. [...]

The most recent versions of clang and gcc makes this an error and the program will not compile (see gcc live example):

error: cannot declare '::main' to be a global variable
int main = ( std::cout << "C++ is excellent!\n", 195 ); 
    ^

So why was there no diagnostic in older versions of gcc and clang? This defect report did not even have a proposed resolution until late 2014 and so this case was only very recently explicitly ill-formed, which requires a diagnostic.

Prior to this, it seems like this would be undefined behavior since we are violating a shall requirement of the draft C++ standard from section 3.6.1 [basic.start.main]:

A program shall contain a global function called main, which is the designated start of the program. [...]

Undefined behavior is unpredictable and does not require a diagnostic. The inconsistency we see with reproducing the behavior is typical undefined behavior.

So what is the code actually doing and why in some cases does it produce results? Let's see what we have:

declarator  
|        initializer----------------------------------
|        |                                           |
v        v                                           v
int main = ( std::cout << "C++ is excellent!\n", 195 ); 
    ^      ^                                   ^
    |      |                                   |
    |      |                                   comma operator
    |      primary expression
global variable of type int

We have main which is an int declared in the global namespace and is being initialized, the variable has static storage duration. It is implementation defined whether the initialization will take place before an attempt to call main is made but it appears gcc does do this before calling main.

The code uses the comma operator, the left operand is a discarded value expression and is used here solely for the side effect of calling std::cout. The result of the comma operator is the right operand which in this case is the prvalue 195 which is assigned to the variable main.

We can see sergej points out the generated assembly shows that cout is called during static initialization. Although the more interesting point for discussion see live godbolt session would be this:

main:
.zero   4

and the subsequent:

movl    $195, main(%rip)

The likely scenario is that the program jumps to the symbol main expecting valid code to be there and in some cases will seg-fault. So if that is the case we would expect storing valid machine code in the variable main could lead to workable program, assuming we are located in a segment that allows code execution. We can see this 1984 IOCCC entry does just that.

It appears we can get gcc to do this in C using (see it live):

const int main = 195 ;

It seg-faults if the variable main is not const presumably because it is not located in an executable location, Hat Tip to this comment here which gave me this idea.

Also see FUZxxl answer here to a C specific version of this question.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • Why implementation isn't giving any warnings also. (When I use -Wall & -Wextra it still not gives single warning). Why? What you think about @Mark B's answer of this question? – Destructor Sep 30 '15 at 05:53
  • IMHO, the compiler shouldn't give a warning because `main` isn't a reserved identifier (3.6.1/3). In this case, I think VS2013's handling of this case (see Francis Cugler's answer) is more correct in it's handling than gcc & clang. – cdmh Sep 30 '15 at 08:57
  • @PravasiMeet I updated my answer wrt to why earlier versions of gcc did not give a diagnostic. – Shafik Yaghmour Sep 30 '15 at 11:55
  • I know sometimes changes due to defect reports are back ported but it was never totally clear to me when this is required by the standard. It is also not clear whether this is now ill-formed in C++14 and C++11 as well. I have seen conflicting information with regards to how far back defects should go. I saw a good conversation once on it but I lost the link :-( – Shafik Yaghmour Sep 30 '15 at 12:06
  • I would have expected this program to crash after printing the string, precisely because writable global variables should be in a memory region that is not executable. Perhaps Pravasi was testing on an OS/CPU combination that does not support readable-but-not-executable memory regions; however, I thought Windows 7 did support that, so *shrug*. – zwol Sep 30 '15 at 21:29
  • 2
    ... and indeed, when I test the OP's program on Linux/x86-64, with g++ 5.2 (which accepts the program - I guess you weren't kidding about "most recent version"), it crashes exactly where I expected it would. – zwol Sep 30 '15 at 21:31
  • +1 This detailed and helpful answer is not possible to the post of which this has been deemed a duplicate by Ben Voigt. – Walter Oct 01 '15 at 09:03
  • 1
    @Walter I don't believe these are duplicates the former is asking much narrower question. There is clearly a group of SO users that have a more reductionist view of duplicates which I does not make much sense to me since we could boil down most SO questions to some version of older questions bu then SO would not be very useful. – Shafik Yaghmour Oct 01 '15 at 13:10
  • @ShafikYaghmour: I still not understand why const int main=192 works in C but results in linker error in C++? Will you please explain the reason clearly behind this difference in C & C++? And why program exists with garbage status (means return value) in C? – Destructor Oct 01 '15 at 13:41
  • @PravasiMeet I don't know, it is a good question and I have not found the answer to that one. The best way to get that answer is to have more eyes see my answer and most likely someone with more specific knowledge will make a comment. But the question won't get many more looks if it is closed. – Shafik Yaghmour Oct 01 '15 at 13:46
  • @ShafikYaghmour: But why it is marked as duplicate? I am very disappointed. – Destructor Oct 01 '15 at 13:55
  • @PravasiMeet not clear to me, you are asking a much broader question then whether this code is valid or not, so I don't believe this question is a duplicate of the listed question. Which is why I suggested you edit your question to emphasis that while validity is relevant the mechanics is what the question is about. – Shafik Yaghmour Oct 01 '15 at 14:00
  • @PravasiMeet The garbage exit status is simply because nothing *sets* the return value of the pseudo-`main`. (Try changing the number from 195 to 12828721 = 31 C0 C3 00 (little-endian) = `xorl %eax, %eax; ret`!) I would have to see the linker error to say anything about that. – zwol Oct 02 '15 at 00:49
  • @PravasiMeet Also I want to second what Shafik said about revising the question to make it clear that you want an answer that goes beyond "undefined behavior, end of discussion." – zwol Oct 02 '15 at 00:50
  • @ShafikYaghmour You meant 195, not 192, in the text of your answer, ne? – zwol Oct 02 '15 at 00:51
  • @zwol yes, this is what happens when you say *it is just a short piece of code let me type it out instead of copy-n-paste* ... everytime ;-) – Shafik Yaghmour Oct 02 '15 at 01:41
  • "It is implementation defined whether the initialization will take place before an attempt to call `main` is made" - is that really the case? I'm not a C++ person but I would assume initialization of globals in the translation unit (apologies if I'm butchering terminology here) would have to happen before calling `main`, no? Or did you just mean "before checking if `main` exists"? – CupawnTae Oct 07 '15 at 09:00
  • @CupawnTae it can be delayed until the first use of the variable in `main`. – Shafik Yaghmour Oct 07 '15 at 09:19
  • Variant that alters the value returned to the call in this case `42` ... `const char main[] ="\xb8\x2a\x00\x00\x00\xc3";` and [live version](https://wandbox.org/permlink/kDRDCYfsctq8YK5k) ... see [tweet for context](https://twitter.com/shafikyaghmour/status/995745398588297216). – Shafik Yaghmour May 17 '18 at 17:59
20

From 3.6.1/1:

A program shall contain a global function called main, which is the designated start of the program. It is implementation defined whether a program in a freestanding environment is required to define a main function.

From this it looks like g++ happens to allow a program (presumably as the "freestanding" clause) without a main function.

Then from 3.6.1/3:

The function main shall not be used (3.2) within a program. The linkage (3.5) of main is implementation defined. A program that declares main to be inline or static is illformed. The name main is not otherwise reserved.

So here we learn that it's perfectly fine to have an integer variable named main.

Finally if you're wondering why the output is printed, the initialization of the int main uses the comma operator to execute cout at static init and then provide an actual integral value to do the initialization.

gsamaras
  • 71,951
  • 46
  • 188
  • 305
Mark B
  • 95,107
  • 10
  • 109
  • 188
11

gcc 4.8.1 generates the following x86 assembly:

.LC0:
    .string "C++ is excellent!\n"
    subq    $8, %rsp    #,
    movl    std::__ioinit, %edi #,
    call    std::ios_base::Init::Init() #
    movl    $__dso_handle, %edx #,
    movl    std::__ioinit, %esi #,
    movl    std::ios_base::Init::~Init(), %edi  #,
    call    __cxa_atexit    #
    movl    $.LC0, %esi #,
    movl    std::cout, %edi #,
    call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)   #
    movl    $195, main(%rip)    #, main
    addq    $8, %rsp    #,
    ret
main:
    .zero   4

Note that cout is called during initialization, not in the main function!

.zero 4 declares 4 (0-initialized) bytes starting at location main, where main is the name of the variable[!].

The main symbol is interpreted as the start of the program. The behavior depends on the platform.

sergej
  • 17,147
  • 6
  • 52
  • 89
  • 2
    Note as [Brian points out](http://stackoverflow.com/questions/32851184/what-is-the-logic-behind-these-two-lines-of-tricky-c-code#comment53536476_32851184) `195` is the opcode for `ret` on some architectures. So saying zero instructions may not be accurate. – Shafik Yaghmour Oct 01 '15 at 03:08
  • @ShafikYaghmour Thanks for your comment, you are right. I got messed up with the assembler directives. – sergej Oct 02 '15 at 07:49
  • Specifically, `0xc3` is a `ret` [on x86 (including x86-64)](http://ref.x86asm.net/coder32.html#xC3). This might work on an old system with `gcc -zexecstack` if that's handled by making *all* pages executable, not just the stack, including sections like `.data` that aren't given exec permission normally. (Unlike `.rodata` on old systems which got linked with `.text`). A very old 32-bit x86 system not using PAE wouldn't support an exec permission bit, so read would imply exec. But this question was from 2015 so that's odd; Intel CPUs since 2006 (Core 2) support x86-64 and the NX bit. – Peter Cordes Aug 12 '22 at 05:21
8

That is an ill-formed program. It crashes on my test environment, cygwin64/g++ 4.9.3.

From the standard:

3.6.1 Main function [basic.start.main]

1 A program shall contain a global function called main, which is the designated start of the program.

Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • I think prior to the defect report I cited, this was just plain undefined behavior. – Shafik Yaghmour Sep 30 '15 at 19:10
  • @ShafikYaghmour, Is that the general principle to be applied at all the places where the standard uses *shall* ? – R Sahu Sep 30 '15 at 19:15
  • I want to say yes but I have not see a good description of the difference. From what I can tell from [this discussion](https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/lk1qAvCiviY), ill-formed NDR and undefined behavior are probably synonymous since neither require a diagnostic. This would seem to imply ill-formed and UB are distinct but not sure. – Shafik Yaghmour Oct 01 '15 at 13:05
  • 3
    C99 section 4 ("Conformance") makes this unambiguous: "If a 'shall' or 'shall not' requirement that appears outside of a constraint is violated, the behavior is undefined." I can't find equivalent wording in C++98 or C++11, but I strongly suspect the committee meant it to be there. (The C and C++ committees really need to sit down and iron out *all* the terminological differences between the two standards.) – zwol Oct 03 '15 at 16:32
7

The reason I believe this works is that the compiler does not know it is compiling the main() function so it compiles a global integer with assignment side-effects.

The object format that this translation-unit is compiled into is not capable of differentiating between a function symbol and a variable symbol.

So the linker happily links to the (variable) main symbol and treats it like a function call. But not until the runtime system has run the global variable initialization code.

When I ran the sample it printed out but then it caused a seg-fault. I assume that's when the runtime system tried to execute an int variable as if it were a function.

Galik
  • 47,303
  • 4
  • 80
  • 117
  • Most systems put variables in non-executable pages, *especially* read-write data. Read-only constants (in `.rodata` on Linux / `.rdata` on Windows) used to get linked into the same program segment as as the .text section; modern GNU Binultils `ld` puts it in a separate segment so it can be read-only without exec. And of course your system has to be x86 (including x86-64), because `0xc3` is the *x86* opcode for a `ret` instruction. On other ISAs, `0xc3` would be some other instruction. – Peter Cordes Aug 12 '22 at 05:16
4

I've tried this on a Win7 64bit OS using VS2013 and it compiles correctly but when I try to build the application I get this message from the output window.

1>------ Build started: Project: tempTest, Configuration: Debug Win32 ------
1>LINK : fatal error LNK1561: entry point must be defined
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
  • 2
    FWIW, that's a linker error, not a message from the debugger. The compilation succeeded, but the linker couldn't find a function `main()` because it's a variable of type `int` – cdmh Sep 30 '15 at 08:54
  • Thanks for the reply I'll reword my initial answer to reflect this. – Francis Cugler Oct 01 '15 at 16:57
-2

You are doing tricky work here. As main( somehow) could declared to be integer. You used list operator to print message & then assign 195 to it. As said by someone below, that it doesn't comfort with C++, is true. But as compiler didn't find any user defined name, main, it didn't complaint. Remember main is not system defined function, its user defined function & thing from which program starts executing is Main Module, not main(), specifically. Again main() is called by startup function which is executed by loader intentionally. Then all of your variables are initialized, & while initializing it output like that. That's it. Program without main() is ok, but not standard.