84
#include <stdio.h>
#define decode(s,t,u,m,p,e,d) m##s##u##t
#define begin decode(a,n,i,m,a,t,e)

int begin()
{
    printf("Ha HA see how it is?? ");
}

Does this indirectly call main? how?

Rajeev Singh
  • 3,292
  • 2
  • 19
  • 30
  • 146
    The macros defined expand begin to say "main". It is just a trick. Nothing interesting. – rghome Apr 06 '16 at 11:19
  • 10
    Your toolchain should have an option to leave the preprocessed code around in a file -- the actual file that is compiled -- where you will see it, indeed, has a main() –  Apr 06 '16 at 14:48
  • @rghome Why not post as an answer? And it's clearly interesting, given the number of upvotes. – Matsemann Apr 10 '16 at 09:30
  • 3
    @Matsemann Wow! I didn't notice the up-votes. I could change it to an answer, and if the comment up-votes were answer up-votes, it would be by far my best score, but there is already a detailed response. I think the point of my comment is that it is not really interesting and therefore it acts as an alternative for people not wanting to up-vote the answer. Thanks for pointing it out though. – rghome Apr 10 '16 at 09:37
  • Guys, It's up to the linker as an operating system tool to set the entry point, and not the language itself. You can even set our own entry point, and you can make a library that is also executable! http://unix.stackexchange.com/a/223415/37799 – Ho1 Apr 18 '16 at 13:16
  • It wouldn't be accepted as an international obfuscated C competition entry, as they don't allow entries obfuscate solely by the preprocessor. However it could be part of one. – Malcolm McLean May 01 '17 at 14:10

6 Answers6

194

C language define execution environment in two categories: freestanding and hosted. In both execution environment a function is called by the environment for program startup.
In a freestanding environment program startup function can be implementation defined while in hosted environment it should be main. No program in C can run without program startup function on the defined environments.

In your case, main is hidden by the preprocessor definitions. begin() will expand to decode(a,n,i,m,a,t,e) which further will be expanded to main.

int begin() -> int decode(a,n,i,m,a,t,e)() -> int m##a##i##n() -> int main() 

decode(s,t,u,m,p,e,d) is a parameterized macro with 7 parameters. Replacement list for this macro is m##s##u##t. m, s, u and t are 4th, 1st, 3rd and 2nd parameter used in the replacement list.

s, t, u, m, p, e, d
1  2  3  4  5  6  7

Rest are of no use (just to obfuscate). Argument passed to decode is "a,n,i,m,a,t,e" so, the identifiers m, s, u and t are replaced with arguments m, a, i and n, respectively.

 m --> m  
 s --> a 
 u --> i 
 t --> n
haccks
  • 104,019
  • 25
  • 176
  • 264
  • Is it GCC specific? what is use of it? – Grijesh Chauhan Apr 06 '16 at 11:23
  • 12
    @GrijeshChauhan all C compilers process the macros, it is required by all C standards since C89. – jdarthenay Apr 06 '16 at 11:27
  • 17
    That's plainly wrong. On Linux I can use `_start()`. Or even more low-level I can try to just align the start of my program with the address to which the IP is set after boot. `main()` is C Standard *library*. C itself does not impose restrictions on this. – ljrk Apr 06 '16 at 12:00
  • @larkey; Is that confirmed by standard? – haccks Apr 06 '16 at 12:38
  • 2
    @haccks The standard *library* does define an entry point. The language itself doesn't care – ljrk Apr 06 '16 at 12:39
  • 3
    Can you please explain how `decode(a,n,i,m,a,t,e)` become `m##a##i##n`? Does it replace characters? Can you provide a link to the documentation of the `decode` function? Thanks. – A.L Apr 06 '16 at 14:59
  • 1
    @A.L First `begin` is defined to be replaced by `decode(a,n,i,m,a,t,e)` which is defined before. This function takes the arguments `s,t,u,m,p,e,d` and concatenates them in this form `m##s##u##t` (`##` means concatenate). Ie, it ignores the values of p,e and d. As you "call" `decode` with s=a, t=n, u=i, m=m it effectively replaces `begin` with `main`. – ljrk Apr 06 '16 at 15:15
  • 1
    @A.L; Added an explanation to the answer. – haccks Apr 06 '16 at 15:30
  • 1
    Just FYI, "The ‘##’ preprocessing operator performs token pasting." see https://gcc.gnu.org/onlinedocs/cpp/Concatenation.html – bigeast Apr 16 '16 at 05:16
  • I've rewritten the entry paragraph so that it (IMO) has a) the very same *intended* meaning, b) a better phrasing (so that there's no need for further discussion here), c) overall more short and concise. Please improve on it further if you will, but be ware of the consequences of *ambiguous wording* - it usually triggers needless discussions where both sides are right yet no consensus is achieved. –  Apr 18 '16 at 13:50
  • `main()` is not the only starting point for C programs, `main()` is the default one, C spec says different environments can have different entry points. – Vad Aug 26 '16 at 22:16
  • @vad; That statement was controversial from the first day of the answer. I reworded it. – haccks Aug 27 '16 at 01:58
  • 1
    @larkey; Now agreed by your [comment](http://stackoverflow.com/questions/36449358/this-obfuscated-c-code-claims-to-run-without-a-main-but-what-does-it-really-d/36449421?noredirect=1#comment60513648_36449421). C standard does says that a freestanding environment can have implementation defined starting point. – haccks Aug 27 '16 at 02:00
71

Try using gcc -E source.c, output ends with:

int main()
{
    printf("Ha HA see how it is?? ");
}

So a main() function is actually generated by preprocessor.

jdarthenay
  • 3,062
  • 1
  • 15
  • 20
37

The program in question does call main() due to macro expansion, but your assumption is flawed - it doesn't have to call main() at all!

Strictly speaking, you can have a C program and be able to compile it without having a main symbol. main is something that the c library expects to jump in to, after it has finished its own initialization. Usually you jump into main from the libc symbol known as _start. It is always possible to have a very valid program, that simply executes assembly, without having a main. Take a look at this:

/* This must be compiled with the flag -nostdlib because otherwise the
 * linker will complain about multiple definitions of the symbol _start
 * (one here and one in glibc) and a missing reference to symbol main
 * (that the libc expects to be linked against).
 */

void
_start ()
{
    /* calling the write system call, with the arguments in this order:
     * 1. the stdout file descriptor
     * 2. the buffer we want to print (Here it's just a string literal).
     * 3. the amount of bytes we want to write.
     */
    asm ("int $0x80"::"a"(4), "b"(1), "c"("Hello world!\n"), "d"(13));
    asm ("int $0x80"::"a"(1), "b"(0)); /* calling exit syscall, with the argument to be 0 */
}

Compile the above with gcc -nostdlib without_main.c, and see it printing Hello World! on the screen just by issuing system calls (interrupts) in inline assembly.

For more information about this particular issue, check out the ksplice blog

Another interesting issue, is that you can also have a program that compiles without having the main symbol correspond to a C function. For instance you can have the following as a very valid C program, that only makes the compiler whine when you up the Warnings level.

/* These values are extracted from the decimal representation of the instructions
 * of a hello world program written in asm, that gdb provides.
 */
const int main[] = {
    -443987883, 440, 113408, -1922629632,
    4149, 899584, 84869120, 15544,
    266023168, 1818576901, 1461743468, 1684828783,
    -1017312735
};

The values in the array are bytes that correspond to the instructions needed to print Hello World on the screen. For a more detailed account of how this specific program works, take a look at this blog post, which is where I also read it first.

I want to make one final notice about these programs. I do not know if they register as valid C programs according to the C language specification, but compiling these and running them is certainly very possible, even if they violate the specification itself.

NlightNFotis
  • 9,559
  • 5
  • 43
  • 66
  • 1
    Is the name of `_start` part of a defined standard, or is that just implementation-specific? Certainly your "main as an array" is architecture-specific. Also important, it would not be unreasonable for your "main as an array" trick to fail at run time due to security restrictions (though that would be more likely if you did not use the `const` qualifier, and still many systems would permit it). – mah Apr 07 '16 at 20:21
  • 1
    @mah: `_start` is not in the ELF standard, though the AMD64 psABI contains a reference to `_start` at _3.4 Process Initialization_. Officially, ELF only knows about the address at `e_entry`in the ELF header, `_start` is just a name the implementation chose. – ninjalj Apr 07 '16 at 20:58
  • 1
    @mah *Also important, it would not be unreasonable for your "main as an array" trick to fail at run time due to security restrictions (though that would be more likely if you did not use the const qualifier, and still many systems would permit it).* Only if the final executable is in some way distinguishable as something insecure - a binary executable is a binary executable no matter how it got there. And `const` won't matter one bit - the symbol name in that binary executable file is `main`. No more, no less. `const` is a C construct that means nothing at execution time. – Andrew Henle Apr 08 '16 at 01:01
  • 1
    @Stewart: it certainly fails on ARMv6l (segmentation fault). But it should work on any x86-64 architecture. – leftaroundabout Apr 08 '16 at 14:36
  • @AndrewHenle _a binary executable is a binary executable no matter how it got there_ - not exactly true. A binary executable is not a single blob of executable instructions, it's a carefully mapped blob of partitions, some of which are instructions, some of which are read-only data, and some of which are data to be initialized into read-write data. (Some) security hardware MMUs can prevent execution from pages not marked as such, and this is a good feature to prevent, for example, stack overflows leading to executing code on the stack but sadly that's sometimes legitimate or often not enabled. – mah Apr 08 '16 at 23:44
  • @AndrewHenle _And const won't matter one bit_ - actually `const` can make the difference between placing the data in read-only memory (where it might be part of the executable pages) and read-write memory (where on a secure system, it would not be executable). I'm not saying you're going to find this to be the case on your system as typical general purpose computers are not going to use this measure of security (if the MMU even supports it), but there's far more in the world than general purpose computers. – mah Apr 08 '16 at 23:48
  • @AndrewHenle _the symbol name in that binary executable file is main. No more, no less._ Executables do not have symbols, except as possible metadata for debuggers only. They're important for the linker to serve as placeholders for addressing. _const is a C construct that means nothing at execution time._ - except it can have significance towards where the compiler placed things. There are quite a few low level details which you've either not thought of here, or not had reason to learn about in the past. – mah Apr 08 '16 at 23:50
  • @leftaroundabout perhaps any _Windows_ x86-64 architecture (or whatever OS was provided for in the example)... but not _any_ x86-64 architecture. Since the bits are precompiled code for an I/O call, it's highly unlikely that it would ever work for an operating system other than the one it was built for. It gives a segmentation fault when run on an x86-64 Mac (running OS X), for example. – mah Apr 11 '16 at 16:05
30

Someone is trying to act like Magician. He thinks he can trick us. But we all know, c program execution begins with main().

The int begin() will be replaced with decode(a,n,i,m,a,t,e) by one pass of preprocessor stage. Then again, decode(a,n,i,m,a,t,e) will be replaced with m##a##i##n. As by positional association of macro call, s will has a value of character a. Likewise, u will be replaced by 'i' and t will be replaced by 'n'. And, that's how, m##s##u##t will become main

Regarding, ## symbol in macro expansion, it is the preprocessing operator and it performs token pasting. When a macro is expanded, the two tokens on either side of each ‘##’ operator are combined into a single token, which then replaces the ‘##’ and the two original tokens in the macro expansion.

If you don't believe me, you can compile your code with -E flag. It will stop compilation process after preprocessing and you can see the result of token pasting.

gcc -E FILENAME.c
abhiarora
  • 9,743
  • 5
  • 32
  • 57
11

decode(a,b,c,d,[...]) shuffles the first four arguments and joins them to get a new identifier, in the order dacb. (The remaining three arguments are ignored.) For instance, decode(a,n,i,m,[...]) gives the identifier main. Note that this is what the begin macro is defined as.

Therefore, the begin macro is simply defined as main.

Frxstrem
  • 38,761
  • 9
  • 79
  • 119
2

In your example, main() function is actually present, because begin is a macro which the compiler replaces with decode macro which in turn replaced by the expression m##s##u##t. Using macro expansion ##, you will reach the word main from decode. This is a trace:

begin --> decode(a,n,i,m,a,t,e) --> m##parameter1##parameter3##parameter2 ---> main

It's just a trick to have main(), but using the name main() for the program's entry function is not necessary in C programming language. It depends on your operating systems and the linker as one of its tools.

In Windows, you don't always use main(), but rather WinMain or wWinMain, although you can use main(), even with Microsoft's toolchain. In Linux, one can use _start.

It's up to the linker as an operating system tool to set the entry point, and not the language itself. You can even set our own entry point, and you can make a library that is also executable!

Community
  • 1
  • 1
Ho1
  • 1,239
  • 1
  • 11
  • 29
  • @vaxquis You're right, but this is a partial answer I wrote to compliment/correct the first answer which binds `main()` function to the C programming language, which is not correct. – Ho1 Apr 18 '16 at 13:48
  • @vaxquis I assumed that explaining "main() function is not essential in C programs" would be a partial answer. I have added a paragraph to make the answer complete. – Ho1 16 mins ago – Ho1 Apr 18 '16 at 14:33