33

Consider following C program (see live demo here).

const int main = 195;

I know that in the real world no programmer writes code like this, because it serves no useful purpose and doesn't make any sense. But when I remove the const keyword from above the program it immediately results in a segmentation fault. Why? I am eager to know the reason behind this.

GCC 4.8.2 gives following warning when compiling it.

warning: 'main' is usually a function [-Wmain]

const int main = 195;
          ^

Why does the presence and absence of const keyword make a difference here in the behavior of the program?

Community
  • 1
  • 1
Destructor
  • 14,123
  • 11
  • 61
  • 126
  • 7
    According to the standard, this is simply undefined behavior. – melpomene Oct 23 '15 at 15:04
  • 4
    @machine_1 195 is the encoding for the opcode `ret` (return from function) on 8086 and its successors. You can guess what happens when you put that in a variable and call that variable as a function. – fuz Oct 23 '15 at 15:29
  • 2
    It is probably relevant to link to [How can a program with a global variable called main instead of a main function work?](http://stackoverflow.com/q/32851184/1708801) – Shafik Yaghmour Oct 23 '15 at 15:30
  • Did you choose the value on purpose to coincide with `ret` instruction? – Ruslan Oct 24 '15 at 15:37
  • 2
    @Ruslan If you do some searching you can find various versions of this in several places. On the stack exchange network [this was one of the older references](http://codegolf.stackexchange.com/a/23397/45595). In my answer to the link above we can find a 1984 IOCCC entry that does something similar but is much more sophisticated. – Shafik Yaghmour Oct 24 '15 at 23:22

2 Answers2

62

Observe how the value 195 corresponds to the ret (return from function) instruction on 8086 compatibles. This definition of main thus behaves as if you defined it as int main() {} when executed.

On some platforms, const data is loaded into an executable but not writeable memory region whereas mutable data (i.e. data not qualified const) is loaded into a writeable but not executable memory region. For this reason, the program “works” when you declare main as const but not when you leave off the const qualifier.

Traditionally, binaries contained three segments:

  • The text segment is (if supported by the architecture) write-protected and executable, and contains executable code, variables of static storage duration qualified const, and string literals
  • The data segment is writeable and cannot be executed. It contains variables not qualified const with static storage duration and (at runtime) objects with allocated storage duration
  • The bss segment is similar to the data segment but is initialized to all zeroes. It contains variables of static storage duration not qualified const that have been declared without an initializer
  • The stack segment is not present in the binary and contains variables with automatic storage duration

Removing the const qualifier from the variable main causes it to be moved from the text to the data segment, which isn't executable, causing the segmentation violation you observe.

Modern platforms often have further segments (e.g. a rodata segment for data that is neither writeable nor executable) so please don't take this as an accurate description of your platform without consulting platform-specific documentation.

Please understand that not making main a function is usually incorrect, although technically a platform could allow main to be declared as a variable, cf. ISO 9899:2011 §5.1.2.2.1 ¶1, emphasis mine:

1 The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters (...) or with two parameters (...) or equivalent; or in some other implementation-defined manner.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • Some good points, [zwol touched on some of these here](http://stackoverflow.com/a/32851722/1708801) in the comments to my answer to a similar C++ version of this question – Shafik Yaghmour Oct 23 '15 at 15:32
  • 1
    Please include your comment on the encoding for the opcode ret in your answer. This is key to understanding the behaviour described. – Dekay Oct 23 '15 at 19:40
  • @user19474 Better this way? – fuz Oct 23 '15 at 19:45
  • @FUZxxl: nice answer. But ur answer doesn't explain why program exits with garbage value as return status instead of 0? It would be better if you could tell the reason behind it. – Destructor Dec 15 '15 at 16:36
  • @PravasiMeet The `ret` instruction exits the current function. It does not set a return value. Thus the program exits with whatever was in the `eax` register at the time `main` returned, i.e. a random garbage value. – fuz Dec 15 '15 at 16:49
11

In C, main at global scope is almost always a function.

To use main as a variable at global scope makes the behaviour of the program undefined.

(It just might be the case that when you write const the compiler optimises out the variable to a constant and so your program behaviour is different. But the program behaviour is still undefined).

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 5
    Nope. A platform may allow `main` to be declared “in an implementation defined manner.” – fuz Oct 23 '15 at 15:10
  • 1
    I'm too old to trawl through the standard but I imagine it must *always* be a function! – Bathsheba Oct 23 '15 at 15:10
  • 8
    Cf. ISO 9899:2011 §5.1.2.2.1 “The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters: (...) or with two parameters (referred to here as argc and argv, though any names may be used, as they are local to the function in which they are declared): (...) or equivalent; 10) or in some other implementation-defined manner.” – fuz Oct 23 '15 at 15:11
  • 1
    OK. Good standard reference! I've amended. – Bathsheba Oct 23 '15 at 15:15