7

I developed many years in C and only now discovered that a program can execute code prior to main() function. Here is a code example

int generateNum(){
    // Some malicious code here...
    return 5;
}

static int someArray[] = {generateNum(),generateNum()}  

int main(){
     // Some code here...
}

The function generateNum() is called twice before main().

My questions are

  1. Who calls generateNum()? I know that on Windows it is crtexe()
  2. Is this behavior standardized on different platforms: Windows/Linux/Android/iOS?
  3. How can I get more information about this behavior? I want to search in Google, but I don't know how to describe it.
  4. Can I do anything I want inside the generateNum()? I mean, can I call malloc()? What about fopen() and fwrite()? Can I open a socket and send information over UDP? Eventually I can abuse this function and even call to main() from it :-)
DanielHsH
  • 4,287
  • 3
  • 30
  • 36
  • 7
    static objects are initialized before main is entered, as per the standard (C++ standard as far as I am certain, but I think C as well). – StoryTeller - Unslander Monica Nov 26 '13 at 13:43
  • In fact, dummy statics can be used to invoke certain code from their constructors before main is entered. – StoryTeller - Unslander Monica Nov 26 '13 at 13:46
  • See here for more info http://stackoverflow.com/questions/4783404/is-main-really-start-of-a-c-program – StoryTeller - Unslander Monica Nov 26 '13 at 13:47
  • 2
    The caveat is that you don't know in which order initialization is performed. It can even change between compiles on the same machine, depending on the link order, object file layout and the phase of the moon. – JvO Nov 26 '13 at 13:53
  • 3
    "C/C++" is not a language. -1 for a fictituous question that [makes no sense](http://ideone.com/mrbsQK). – Kerrek SB Nov 26 '13 at 13:55
  • @JvO Though only between different translation units, no? In a single file it should be well-defined, shouldn't it? – Christian Rau Nov 26 '13 at 13:55
  • 3
    This code is not valid C. In that language global initialisers have to be constant, so `main` *is* the first user-provided code to be executed. – Mike Seymour Nov 26 '13 at 13:57
  • @ChristianRau In a single file it's in order of declaration within the file. – john Nov 26 '13 at 13:57
  • I wouldn't count on it... Compilers have large amount of freedom in optimizing their code. It object are dependant on another I would use singletons so the order doesn't really matter. – JvO Nov 26 '13 at 13:59
  • @JvO At least you can (or should) count on what the standard says. The standard can never be optimized away by any amount of freedom. – Christian Rau Nov 26 '13 at 14:00
  • Hmm, okay. But I still wouldn't *rely* on it... – JvO Nov 26 '13 at 14:01
  • @JvO in C the order of evaluation in an intialization list [is unspecified](http://stackoverflow.com/questions/19881803/are-multiple-mutations-of-the-same-variable-within-initializer-lists-undefined-b) but in C++ it is [well defined](http://stackoverflow.com/questions/14442894/are-multiple-mutations-within-initializer-lists-undefined-behavior). There is even a sequence point after each initializer. – Shafik Yaghmour Nov 26 '13 at 14:14
  • @JvO: If you cannot *rely* on what the standard requires there is little you can do in programming. That being said, I rarely rely on compilers doing the right thing with complex templates... but they have proven to implement their own *flavors* of the standard. At any rate I don't know of any compiler that does not follow the top-down initialization order mandated by the standard *within a single translation unit*, except where extensions in the compiler allow you to force some different ordering. As a matter of fact, initialization of `cin` and `cout` depend on this order. – David Rodríguez - dribeas Nov 26 '13 at 14:19

2 Answers2

7

A program shall contain a global function called main, which is the designated start of the program.

It doesn't say that no code executes before main is called. Full quote:

3.6.1 Main function [basic.start.main]

1 A program shall contain a global function called main, which is the designated start of the program. It is implementation-defined whether a program in a freestanding environment is required to define a main function. [Note: in a freestanding environment, start-up and termination is implementation-defined; start-up contains the execution of constructors for objects of namespace scope with static storage duration; termination contains the execution of destructors for objects with static storage duration. ]

Sadique
  • 22,572
  • 7
  • 65
  • 91
7
  1. C++ guarantees that such initialisations take place before main. This can be taken care of by the operating system loader/linker, or by some special module linked against the object file that contained main. For gcc, this is described here: http://gcc.gnu.org/onlinedocs/gccint/Initialization.html
  2. Not quite. C++11, 3.6.2.4 (basic.start.init): It is implementation-defined whether the dynamic initialization of a non-local variable with static storage duration is done before the first statement of main. Note that initialization takes place before you can ever access that value, though, especially before there is any notion of reference to an entity in the same compilation unit.
  3. [basic.start.init] in the language standard is what you want to have a look at. The behaviour here is dynamic initialization for variables with static storage duration.
creichen
  • 1,728
  • 9
  • 16
  • Can I call malloc() from initialization function? What about fopen() and fwrite()? Can I open a socket and send information over UDP? – DanielHsH Nov 26 '13 at 14:33
  • @DanielHsH, no to all of the things you mentioned. – StoryTeller - Unslander Monica Nov 26 '13 at 14:35
  • P.s., I just tried malloc() and it works on windows and linux. – DanielHsH Nov 26 '13 at 14:37
  • 1
    @DanielHsH, I would expect it to work (since many container classes that you may want to use will use dynamic memory allocation), but I don't know-- C++ has many rules and exceptions. However: Since we're talking about a C++ exclusive feature here, you don't want to use `malloc()`; instead, use `new` and `new[]` unless you have a very good reason not to (e.g., interfacing with C code that does `free`). – creichen Nov 26 '13 at 14:39