15

Compiling this code with -O3:

#include <iostream>
int main(){std::cout<<"Hello World"<<std::endl;}

results in a file with a length of 25,890 bytes. (Compiled with GCC 4.8.1)

Can't the compiler just store two calls to write(STDOUT_FILENO, ???, strlen(???));, store write's contents, store the string, and boom write it to the disk? It should result in a EXE with a length under 1,024 bytes to my estimate.

Compiling a hello world program in assembly results in 17 bytes file: https://stackoverflow.com/questions/284797/hello-world-in-less-than-17-bytes, means actual code is 5-bytes long. (The string is Hello World\0)

What that EXE stores except the actual main and the functions it calls?

NOTE: This question applies to MSVC too.


Edit:
A lot of users pointed at iostream as being the culprit, so I tested this hypothesis and compiled this program with the same parameters:

int main( ) {
}

And got 23,815 bytes, the hypothesis has been disproved.

Community
  • 1
  • 1
LyingOnTheSky
  • 2,844
  • 1
  • 14
  • 33

8 Answers8

15

The compiler generates by default a complete PE-conformant executable. Assuming a release build, the simple code you posted might probably include:

  • all the PE headers and tables needed by the loader (e.g. IAT), this also means alignment requirements have to be met
  • CRT library initialization code
  • Debugging info (you need to manually drop these off even for a release build)

In case the compiler were MSVC there would have been additional inclusions:

The link you posted does contain a very small assembly "hello world" program, but in order to properly run in a Windows environment at least the complete and valid PE structure needs to be available to the loader (setting aside all the low-level issues that might cause that code not to run at all).

Assuming the loader had already and correctly 'set up' the process where to run that code into, only at that point you could map it into a PE section and do

jmp small_hello_world_entry_point

to actually execute the code.

References: The PE format

One last notice: UPX and similar compression tools are also used to reduce filesize for executables.

Marco A.
  • 43,032
  • 26
  • 132
  • 246
  • You are right about the `a very small assembly "hello world" program`, it isn't runnable somehow. (Not about being 16-bit) – LyingOnTheSky Jun 12 '15 at 13:44
  • Your assumption is clearly wrong, the question says GCC is the compiler, several times – Ben Voigt Jun 12 '15 at 13:56
  • That is correct. I got the compiler wrong. Many of the considerations I wrote still stand though. I'll remove the MSVC-specific ones. – Marco A. Jun 12 '15 at 14:04
  • 3
    @MarcoA. I think the MSVC parts will help for others too, and they may apply for GCC too. (You can mark those and explain at the end of the answer that they may only apply to MSVC) – LyingOnTheSky Jun 14 '15 at 13:45
  • 1
    @MarcoA. [crinkler](http://crinkler.net/) should be a better solution than UPX, as it replaces the linker itself and therefore is able to achieve much smaller executables. You might also want to mentioned that there are several barebone programming languages, which automatically result in the smallest executable size (PowerBASIC/FreeBASIC etc., dialects of Pascal, ...) without compressing sections like crinkler/MEW SE/UPX. – turbo Jun 21 '15 at 13:26
9

C++ isn't assembly, like C it comes with a lot of infrastructure. In addition to the overheads of C - required to be compatible with the C abi - C++ also has its own variants of many things, and it also has to have all the tear-up and -down code required to provide the many guarantees of the language.

Much of these are provided by libraries, but some of it has to be in the executable itself so that a failure to load shared libraries could be handled.

Under Linux/BSD we can reverse engineer an executable with objdump -dsl. I took the following code:

int main() {}

and compiled it with:

g++ -Wall -O3 -g0 test.cpp -o test.exe

The resulting executable?

6922 bytes

Then I compiled with less cruft:

g++ -Wall -O3 -g0 test.cpp -o test.exe -nostdlib
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400150

Basically: main is a facade entry point for our C++ code, the program really starts at _start.

Executable size?

1454 bytes

Here's how objdump describes the two:

g++ -Wall -O3 -g0 test.cpp -o test.exe
objdump -test.exe

test.exe:     file format elf64-x86-64

Contents of section .interp:
 400200 2f6c6962 36342f6c 642d6c69 6e75782d  /lib64/ld-linux-
 400210 7838362d 36342e73 6f2e3200           x86-64.so.2.    
Contents of section .note.ABI-tag:
 40021c 04000000 10000000 01000000 474e5500  ............GNU.
 40022c 00000000 02000000 06000000 12000000  ................
Contents of section .note.gnu.build-id:
 40023c 04000000 14000000 03000000 474e5500  ............GNU.
 40024c a0f55c7d 671f9eb2 93078fd3 0f52581a  ..\}g........RX.
 40025c 544829b2                             TH).            
Contents of section .hash:
 400260 03000000 06000000 02000000 05000000  ................
 400270 00000000 00000000 00000000 01000000  ................
 400280 00000000 03000000 04000000           ............    
Contents of section .dynsym:
 400290 00000000 00000000 00000000 00000000  ................
 4002a0 00000000 00000000 10000000 20000000  ............ ...
 4002b0 00000000 00000000 00000000 00000000  ................
 4002c0 1f000000 20000000 00000000 00000000  .... ...........
 4002d0 00000000 00000000 8b000000 12000000  ................
 4002e0 00000000 00000000 00000000 00000000  ................
 4002f0 33000000 20000000 00000000 00000000  3... ...........
 400300 00000000 00000000 4f000000 20000000  ........O... ...
 400310 00000000 00000000 00000000 00000000  ................
Contents of section .dynstr:
 400320 006c6962 73746463 2b2b2e73 6f2e3600  .libstdc++.so.6.
 400330 5f5f676d 6f6e5f73 74617274 5f5f005f  __gmon_start__._
 400340 4a765f52 65676973 74657243 6c617373  Jv_RegisterClass
 400350 6573005f 49544d5f 64657265 67697374  es._ITM_deregist
 400360 6572544d 436c6f6e 65546162 6c65005f  erTMCloneTable._
 400370 49544d5f 72656769 73746572 544d436c  ITM_registerTMCl
 400380 6f6e6554 61626c65 006c6962 6d2e736f  oneTable.libm.so
 400390 2e36006c 69626763 635f732e 736f2e31  .6.libgcc_s.so.1
 4003a0 006c6962 632e736f 2e36005f 5f6c6962  .libc.so.6.__lib
 4003b0 635f7374 6172745f 6d61696e 00474c49  c_start_main.GLI
 4003c0 42435f32 2e322e35 00                 BC_2.2.5.       
Contents of section .gnu.version:
 4003ca 00000000 00000200 00000000           ............    
Contents of section .gnu.version_r:
 4003d8 01000100 81000000 10000000 00000000  ................
 4003e8 751a6909 00000200 9d000000 00000000  u.i.............
Contents of section .rela.dyn:
 4003f8 50096000 00000000 06000000 01000000  P.`.............
 400408 00000000 00000000                    ........        
Contents of section .rela.plt:
 400410 70096000 00000000 07000000 03000000  p.`.............
 400420 00000000 00000000                    ........        
Contents of section .init:
 400428 4883ec08 e85b0000 00e86a01 0000e845  H....[....j....E
 400438 02000048 83c408c3                    ...H....        
Contents of section .plt:
 400440 ff351a05 2000ff25 1c052000 0f1f4000  .5.. ..%.. ...@.
 400450 ff251a05 20006800 000000e9 e0ffffff  .%.. .h.........
Contents of section .text:
 400460 31ed4989 d15e4889 e24883e4 f0505449  1.I..^H..H...PTI
 400470 c7c0e005 400048c7 c1f00540 0048c7c7  ....@.H....@.H..
 400480 d0054000 e8c7ffff fff49090 4883ec08  ..@.........H...
 400490 488b05b9 04200048 85c07402 ffd04883  H.... .H..t...H.
 4004a0 c408c390 90909090 90909090 90909090  ................
 4004b0 90909090 90909090 90909090 90909090  ................
 4004c0 b88f0960 00482d88 09600048 83f80e76  ...`.H-..`.H...v
 4004d0 17b80000 00004885 c0740dbf 88096000  ......H..t....`.
 4004e0 ffe0660f 1f440000 f3c3660f 1f440000  ..f..D....f..D..
 4004f0 be880960 004881ee 88096000 48c1fe03  ...`.H....`.H...
 400500 4889f048 c1e83f48 01c648d1 fe7411b8  H..H..?H..H..t..
 400510 00000000 4885c074 07bf8809 6000ffe0  ....H..t....`...
 400520 f3c36666 6666662e 0f1f8400 00000000  ..fffff.........
 400530 803d5104 20000075 5f5553bb 80076000  .=Q. ..u_US...`.
 400540 4881eb78 07600048 83ec0848 8b053e04  H..x.`.H...H..>.
 400550 200048c1 fb034883 eb01488d 6c241048   .H...H...H.l$.H
 400560 39d87322 0f1f4000 4883c001 4889051d  9.s"..@.H...H...
 400570 042000ff 14c57807 6000488b 050f0420  . ....x.`.H.... 
 400580 004839d8 72e2e835 ffffffc6 05f60320  .H9.r..5....... 
 400590 00014883 c4085b5d f3c3660f 1f440000  ..H...[]..f..D..
 4005a0 bf880760 0048833f 007505e9 40ffffff  ...`.H.?.u..@...
 4005b0 b8000000 004885c0 74f15548 89e5ffd0  .....H..t.UH....
 4005c0 5de92aff ffff9090 90909090 90909090  ].*.............
 4005d0 31c0c390 90909090 90909090 90909090  1...............
 4005e0 f3c36666 6666662e 0f1f8400 00000000  ..fffff.........
 4005f0 48896c24 d84c8964 24e0488d 2d630120  H.l$.L.d$.H.-c. 
 400600 004c8d25 5c012000 4c896c24 e84c8974  .L.%\. .L.l$.L.t
 400610 24f04c89 7c24f848 895c24d0 4883ec38  $.L.|$.H.\$.H..8
 400620 4c29e541 89fd4989 f648c1fd 034989d7  L).A..I..H...I..
 400630 e8f3fdff ff4885ed 741c31db 0f1f4000  .....H..t.1...@.
 400640 4c89fa4c 89f64489 ef41ff14 dc4883c3  L..L..D..A...H..
 400650 014839eb 72ea488b 5c240848 8b6c2410  .H9.r.H.\$.H.l$.
 400660 4c8b6424 184c8b6c 24204c8b 7424284c  L.d$.L.l$ L.t$(L
 400670 8b7c2430 4883c438 c3909090 90909090  .|$0H..8........
 400680 554889e5 53bb6807 60004883 ec08488b  UH..S.h.`.H...H.
 400690 05d30020 004883f8 ff74140f 1f440000  ... .H...t...D..
 4006a0 4883eb08 ffd0488b 034883f8 ff75f148  H.....H..H...u.H
 4006b0 83c4085b 5dc39090                    ...[]...        
Contents of section .fini:
 4006b8 4883ec08 e86ffeff ff4883c4 08c3      H....o...H....  
Contents of section .rodata:
 4006c8 01000200                             ....            
Contents of section .eh_frame_hdr:
 4006cc 011b033b 20000000 03000000 04ffffff  ...; ...........
 4006dc 3c000000 14ffffff 54000000 24ffffff  <.......T...$...
 4006ec 6c000000                             l...            
Contents of section .eh_frame:
 4006f0 14000000 00000000 017a5200 01781001  .........zR..x..
 400700 1b0c0708 90010000 14000000 1c000000  ................
 400710 c0feffff 03000000 00000000 00000000  ................
 400720 14000000 34000000 b8feffff 02000000  ....4...........
 400730 00000000 00000000 24000000 4c000000  ........$...L...
 400740 b0feffff 89000000 00518c05 86065f0e  .........Q...._.
 400750 4083078f 028e038d 0402580e 08000000  @.........X.....
 400760 00000000                             ....            
Contents of section .ctors:
 600768 ffffffff ffffffff 00000000 00000000  ................
Contents of section .dtors:
 600778 ffffffff ffffffff 00000000 00000000  ................
Contents of section .jcr:
 600788 00000000 00000000                    ........        
Contents of section .dynamic:
 600790 01000000 00000000 01000000 00000000  ................
 6007a0 01000000 00000000 69000000 00000000  ........i.......
 6007b0 01000000 00000000 73000000 00000000  ........s.......
 6007c0 01000000 00000000 81000000 00000000  ................
 6007d0 0c000000 00000000 28044000 00000000  ........(.@.....
 6007e0 0d000000 00000000 b8064000 00000000  ..........@.....
 6007f0 04000000 00000000 60024000 00000000  ........`.@.....
 600800 05000000 00000000 20034000 00000000  ........ .@.....
 600810 06000000 00000000 90024000 00000000  ..........@.....
 600820 0a000000 00000000 a9000000 00000000  ................
 600830 0b000000 00000000 18000000 00000000  ................
 600840 15000000 00000000 00000000 00000000  ................
 600850 03000000 00000000 58096000 00000000  ........X.`.....
 600860 02000000 00000000 18000000 00000000  ................
 600870 14000000 00000000 07000000 00000000  ................
 600880 17000000 00000000 10044000 00000000  ..........@.....
 600890 07000000 00000000 f8034000 00000000  ..........@.....
 6008a0 08000000 00000000 18000000 00000000  ................
 6008b0 09000000 00000000 18000000 00000000  ................
 6008c0 feffff6f 00000000 d8034000 00000000  ...o......@.....
 6008d0 ffffff6f 00000000 01000000 00000000  ...o............
 6008e0 f0ffff6f 00000000 ca034000 00000000  ...o......@.....
 6008f0 00000000 00000000 00000000 00000000  ................
 600900 00000000 00000000 00000000 00000000  ................
 600910 00000000 00000000 00000000 00000000  ................
 600920 00000000 00000000 00000000 00000000  ................
 600930 00000000 00000000 00000000 00000000  ................
 600940 00000000 00000000 00000000 00000000  ................
Contents of section .got:
 600950 00000000 00000000                    ........        
Contents of section .got.plt:
 600958 90076000 00000000 00000000 00000000  ..`.............
 600968 00000000 00000000 56044000 00000000  ........V.@.....
Contents of section .data:
 600978 00000000 00000000 00000000 00000000  ................
Contents of section .comment:
 0000 4743433a 2028474e 55292034 2e342e37  GCC: (GNU) 4.4.7
 0010 20323031 32303331 33202852 65642048   20120313 (Red H
 0020 61742034 2e342e37 2d313129 00474343  at 4.4.7-11).GCC
 0030 3a202847 4e552920 342e392e 782d676f  : (GNU) 4.9.x-go
 0040 6f676c65 20323031 35303132 33202870  ogle 20150123 (p
 0050 72657265 6c656173 652900             rerelease).     

Disassembly of section .init:

0000000000400428 <_init>:
_init():
  400428:   48 83 ec 08             sub    $0x8,%rsp
  40042c:   e8 5b 00 00 00          callq  40048c <call_gmon_start>
  400431:   e8 6a 01 00 00          callq  4005a0 <frame_dummy>
  400436:   e8 45 02 00 00          callq  400680 <__do_global_ctors_aux>
  40043b:   48 83 c4 08             add    $0x8,%rsp
  40043f:   c3                      retq   

Disassembly of section .plt:

0000000000400440 <__libc_start_main@plt-0x10>:
  400440:   ff 35 1a 05 20 00       pushq  0x20051a(%rip)        # 600960 <_GLOBAL_OFFSET_TABLE_+0x8>
  400446:   ff 25 1c 05 20 00       jmpq   *0x20051c(%rip)        # 600968 <_GLOBAL_OFFSET_TABLE_+0x10>
  40044c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000400450 <__libc_start_main@plt>:
  400450:   ff 25 1a 05 20 00       jmpq   *0x20051a(%rip)        # 600970 <_GLOBAL_OFFSET_TABLE_+0x18>
  400456:   68 00 00 00 00          pushq  $0x0
  40045b:   e9 e0 ff ff ff          jmpq   400440 <_init+0x18>

Disassembly of section .text:

0000000000400460 <_start>:
_start():
  400460:   31 ed                   xor    %ebp,%ebp
  400462:   49 89 d1                mov    %rdx,%r9
  400465:   5e                      pop    %rsi
  400466:   48 89 e2                mov    %rsp,%rdx
  400469:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  40046d:   50                      push   %rax
  40046e:   54                      push   %rsp
  40046f:   49 c7 c0 e0 05 40 00    mov    $0x4005e0,%r8
  400476:   48 c7 c1 f0 05 40 00    mov    $0x4005f0,%rcx
  40047d:   48 c7 c7 d0 05 40 00    mov    $0x4005d0,%rdi
  400484:   e8 c7 ff ff ff          callq  400450 <__libc_start_main@plt>
  400489:   f4                      hlt    
  40048a:   90                      nop
  40048b:   90                      nop

000000000040048c <call_gmon_start>:
call_gmon_start():
  40048c:   48 83 ec 08             sub    $0x8,%rsp
  400490:   48 8b 05 b9 04 20 00    mov    0x2004b9(%rip),%rax        # 600950 <_DYNAMIC+0x1c0>
  400497:   48 85 c0                test   %rax,%rax
  40049a:   74 02                   je     40049e <call_gmon_start+0x12>
  40049c:   ff d0                   callq  *%rax
  40049e:   48 83 c4 08             add    $0x8,%rsp
  4004a2:   c3                      retq   
  4004a3:   90                      nop
  4004a4:   90                      nop
  4004a5:   90                      nop
  4004a6:   90                      nop
  4004a7:   90                      nop
  4004a8:   90                      nop
  4004a9:   90                      nop
  4004aa:   90                      nop
  4004ab:   90                      nop
  4004ac:   90                      nop
  4004ad:   90                      nop
  4004ae:   90                      nop
  4004af:   90                      nop
  4004b0:   90                      nop
  4004b1:   90                      nop
  4004b2:   90                      nop
  4004b3:   90                      nop
  4004b4:   90                      nop
  4004b5:   90                      nop
  4004b6:   90                      nop
  4004b7:   90                      nop
  4004b8:   90                      nop
  4004b9:   90                      nop
  4004ba:   90                      nop
  4004bb:   90                      nop
  4004bc:   90                      nop
  4004bd:   90                      nop
  4004be:   90                      nop
  4004bf:   90                      nop

00000000004004c0 <deregister_tm_clones>:
deregister_tm_clones():
  4004c0:   b8 8f 09 60 00          mov    $0x60098f,%eax
  4004c5:   48 2d 88 09 60 00       sub    $0x600988,%rax
  4004cb:   48 83 f8 0e             cmp    $0xe,%rax
  4004cf:   76 17                   jbe    4004e8 <deregister_tm_clones+0x28>
  4004d1:   b8 00 00 00 00          mov    $0x0,%eax
  4004d6:   48 85 c0                test   %rax,%rax
  4004d9:   74 0d                   je     4004e8 <deregister_tm_clones+0x28>
  4004db:   bf 88 09 60 00          mov    $0x600988,%edi
  4004e0:   ff e0                   jmpq   *%rax
  4004e2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
  4004e8:   f3 c3                   repz retq 
  4004ea:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

00000000004004f0 <register_tm_clones>:
register_tm_clones():
  4004f0:   be 88 09 60 00          mov    $0x600988,%esi
  4004f5:   48 81 ee 88 09 60 00    sub    $0x600988,%rsi
  4004fc:   48 c1 fe 03             sar    $0x3,%rsi
  400500:   48 89 f0                mov    %rsi,%rax
  400503:   48 c1 e8 3f             shr    $0x3f,%rax
  400507:   48 01 c6                add    %rax,%rsi
  40050a:   48 d1 fe                sar    %rsi
  40050d:   74 11                   je     400520 <register_tm_clones+0x30>
  40050f:   b8 00 00 00 00          mov    $0x0,%eax
  400514:   48 85 c0                test   %rax,%rax
  400517:   74 07                   je     400520 <register_tm_clones+0x30>
  400519:   bf 88 09 60 00          mov    $0x600988,%edi
  40051e:   ff e0                   jmpq   *%rax
  400520:   f3 c3                   repz retq 
  400522:   66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  400529:   1f 84 00 00 00 00 00 

0000000000400530 <__do_global_dtors_aux>:
__do_global_dtors_aux():
  400530:   80 3d 51 04 20 00 00    cmpb   $0x0,0x200451(%rip)        # 600988 <__bss_start>
  400537:   75 5f                   jne    400598 <__do_global_dtors_aux+0x68>
  400539:   55                      push   %rbp
  40053a:   53                      push   %rbx
  40053b:   bb 80 07 60 00          mov    $0x600780,%ebx
  400540:   48 81 eb 78 07 60 00    sub    $0x600778,%rbx
  400547:   48 83 ec 08             sub    $0x8,%rsp
  40054b:   48 8b 05 3e 04 20 00    mov    0x20043e(%rip),%rax        # 600990 <dtor_idx.6648>
  400552:   48 c1 fb 03             sar    $0x3,%rbx
  400556:   48 83 eb 01             sub    $0x1,%rbx
  40055a:   48 8d 6c 24 10          lea    0x10(%rsp),%rbp
  40055f:   48 39 d8                cmp    %rbx,%rax
  400562:   73 22                   jae    400586 <__do_global_dtors_aux+0x56>
  400564:   0f 1f 40 00             nopl   0x0(%rax)
  400568:   48 83 c0 01             add    $0x1,%rax
  40056c:   48 89 05 1d 04 20 00    mov    %rax,0x20041d(%rip)        # 600990 <dtor_idx.6648>
  400573:   ff 14 c5 78 07 60 00    callq  *0x600778(,%rax,8)
  40057a:   48 8b 05 0f 04 20 00    mov    0x20040f(%rip),%rax        # 600990 <dtor_idx.6648>
  400581:   48 39 d8                cmp    %rbx,%rax
  400584:   72 e2                   jb     400568 <__do_global_dtors_aux+0x38>
  400586:   e8 35 ff ff ff          callq  4004c0 <deregister_tm_clones>
  40058b:   c6 05 f6 03 20 00 01    movb   $0x1,0x2003f6(%rip)        # 600988 <__bss_start>
  400592:   48 83 c4 08             add    $0x8,%rsp
  400596:   5b                      pop    %rbx
  400597:   5d                      pop    %rbp
  400598:   f3 c3                   repz retq 
  40059a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

00000000004005a0 <frame_dummy>:
frame_dummy():
  4005a0:   bf 88 07 60 00          mov    $0x600788,%edi
  4005a5:   48 83 3f 00             cmpq   $0x0,(%rdi)
  4005a9:   75 05                   jne    4005b0 <frame_dummy+0x10>
  4005ab:   e9 40 ff ff ff          jmpq   4004f0 <register_tm_clones>
  4005b0:   b8 00 00 00 00          mov    $0x0,%eax
  4005b5:   48 85 c0                test   %rax,%rax
  4005b8:   74 f1                   je     4005ab <frame_dummy+0xb>
  4005ba:   55                      push   %rbp
  4005bb:   48 89 e5                mov    %rsp,%rbp
  4005be:   ff d0                   callq  *%rax
  4005c0:   5d                      pop    %rbp
  4005c1:   e9 2a ff ff ff          jmpq   4004f0 <register_tm_clones>
  4005c6:   90                      nop
  4005c7:   90                      nop
  4005c8:   90                      nop
  4005c9:   90                      nop
  4005ca:   90                      nop
  4005cb:   90                      nop
  4005cc:   90                      nop
  4005cd:   90                      nop
  4005ce:   90                      nop
  4005cf:   90                      nop

00000000004005d0 <main>:
main():
  4005d0:   31 c0                   xor    %eax,%eax
  4005d2:   c3                      retq   
  4005d3:   90                      nop
  4005d4:   90                      nop
  4005d5:   90                      nop
  4005d6:   90                      nop
  4005d7:   90                      nop
  4005d8:   90                      nop
  4005d9:   90                      nop
  4005da:   90                      nop
  4005db:   90                      nop
  4005dc:   90                      nop
  4005dd:   90                      nop
  4005de:   90                      nop
  4005df:   90                      nop

00000000004005e0 <__libc_csu_fini>:
__libc_csu_fini():
  4005e0:   f3 c3                   repz retq 
  4005e2:   66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
  4005e9:   1f 84 00 00 00 00 00 

00000000004005f0 <__libc_csu_init>:
__libc_csu_init():
  4005f0:   48 89 6c 24 d8          mov    %rbp,-0x28(%rsp)
  4005f5:   4c 89 64 24 e0          mov    %r12,-0x20(%rsp)
  4005fa:   48 8d 2d 63 01 20 00    lea    0x200163(%rip),%rbp        # 600764 <__init_array_end>
  400601:   4c 8d 25 5c 01 20 00    lea    0x20015c(%rip),%r12        # 600764 <__init_array_end>
  400608:   4c 89 6c 24 e8          mov    %r13,-0x18(%rsp)
  40060d:   4c 89 74 24 f0          mov    %r14,-0x10(%rsp)
  400612:   4c 89 7c 24 f8          mov    %r15,-0x8(%rsp)
  400617:   48 89 5c 24 d0          mov    %rbx,-0x30(%rsp)
  40061c:   48 83 ec 38             sub    $0x38,%rsp
  400620:   4c 29 e5                sub    %r12,%rbp
  400623:   41 89 fd                mov    %edi,%r13d
  400626:   49 89 f6                mov    %rsi,%r14
  400629:   48 c1 fd 03             sar    $0x3,%rbp
  40062d:   49 89 d7                mov    %rdx,%r15
  400630:   e8 f3 fd ff ff          callq  400428 <_init>
  400635:   48 85 ed                test   %rbp,%rbp
  400638:   74 1c                   je     400656 <__libc_csu_init+0x66>
  40063a:   31 db                   xor    %ebx,%ebx
  40063c:   0f 1f 40 00             nopl   0x0(%rax)
  400640:   4c 89 fa                mov    %r15,%rdx
  400643:   4c 89 f6                mov    %r14,%rsi
  400646:   44 89 ef                mov    %r13d,%edi
  400649:   41 ff 14 dc             callq  *(%r12,%rbx,8)
  40064d:   48 83 c3 01             add    $0x1,%rbx
  400651:   48 39 eb                cmp    %rbp,%rbx
  400654:   72 ea                   jb     400640 <__libc_csu_init+0x50>
  400656:   48 8b 5c 24 08          mov    0x8(%rsp),%rbx
  40065b:   48 8b 6c 24 10          mov    0x10(%rsp),%rbp
  400660:   4c 8b 64 24 18          mov    0x18(%rsp),%r12
  400665:   4c 8b 6c 24 20          mov    0x20(%rsp),%r13
  40066a:   4c 8b 74 24 28          mov    0x28(%rsp),%r14
  40066f:   4c 8b 7c 24 30          mov    0x30(%rsp),%r15
  400674:   48 83 c4 38             add    $0x38,%rsp
  400678:   c3                      retq   
  400679:   90                      nop
  40067a:   90                      nop
  40067b:   90                      nop
  40067c:   90                      nop
  40067d:   90                      nop
  40067e:   90                      nop
  40067f:   90                      nop

0000000000400680 <__do_global_ctors_aux>:
__do_global_ctors_aux():
  400680:   55                      push   %rbp
  400681:   48 89 e5                mov    %rsp,%rbp
  400684:   53                      push   %rbx
  400685:   bb 68 07 60 00          mov    $0x600768,%ebx
  40068a:   48 83 ec 08             sub    $0x8,%rsp
  40068e:   48 8b 05 d3 00 20 00    mov    0x2000d3(%rip),%rax        # 600768 <__CTOR_LIST__>
  400695:   48 83 f8 ff             cmp    $0xffffffffffffffff,%rax
  400699:   74 14                   je     4006af <__do_global_ctors_aux+0x2f>
  40069b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  4006a0:   48 83 eb 08             sub    $0x8,%rbx
  4006a4:   ff d0                   callq  *%rax
  4006a6:   48 8b 03                mov    (%rbx),%rax
  4006a9:   48 83 f8 ff             cmp    $0xffffffffffffffff,%rax
  4006ad:   75 f1                   jne    4006a0 <__do_global_ctors_aux+0x20>
  4006af:   48 83 c4 08             add    $0x8,%rsp
  4006b3:   5b                      pop    %rbx
  4006b4:   5d                      pop    %rbp
  4006b5:   c3                      retq   
  4006b6:   90                      nop
  4006b7:   90                      nop

Disassembly of section .fini:

00000000004006b8 <_fini>:
_fini():
  4006b8:   48 83 ec 08             sub    $0x8,%rsp
  4006bc:   e8 6f fe ff ff          callq  400530 <__do_global_dtors_aux>
  4006c1:   48 83 c4 08             add    $0x8,%rsp
  4006c5:   c3                      retq   

and the smaller file:

g++ -Wall -O3 -g0 test.cpp -o test.exe -nostdlib
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400150

test.exe:     file format elf64-x86-64

Contents of section .note.gnu.build-id:
 400120 04000000 14000000 03000000 474e5500  ............GNU.
 400130 d4b1e35c 21d1f541 b81d3ac9 d62bac7a  ...\!..A..:..+.z
 400140 606b1ad4                             `k..            
Contents of section .text:
 400150 31c0c3                               1..             
Contents of section .eh_frame_hdr:
 400154 011b033b 10000000 01000000 fcffffff  ...;............
 400164 2c000000                             ,...            
Contents of section .eh_frame:
 400168 14000000 00000000 017a5200 01781001  .........zR..x..
 400178 1b0c0708 90010000 14000000 1c000000  ................
 400188 c8ffffff 03000000 00000000 00000000  ................
Contents of section .comment:
 0000 4743433a 2028474e 55292034 2e392e78  GCC: (GNU) 4.9.x
 0010 2d676f6f 676c6520 32303135 30313233  -google 20150123
 0020 20287072 6572656c 65617365 2900       (prerelease).  

Disassembly of section .text:

0000000000400150 <main>:
main():
  400150:       31 c0                   xor    %eax,%eax
  400152:       c3                      retq   

It's worth noting that this executable doesn't work, it segfaults: to make it work, we'd actually have to implement _start instead of main.

We can see here that the bulk of the larger executable is glue code that deals with loading the dynamic library and preparing the broader environment required by the standard library.

--- EDIT ---

Even our smaller code still has to include exception handling, ctor/dtor support for globals, and so forth. It could probably elide such things and if you dig deeply enough you can probably find ways to elide them, but in general you probably don't need to, and it is probably easier to always include such basic support than to have the majority of new programmers stumbling over "how do I force the compiler to emit basic language support" than have a handful of new embedded programmers asking "how can I prevent the compiler emitting basic language support?".

Note also that the compiler generates ELF format binaries, this is a small contribution (maybe ~60bytes), plus emitting it's own identity added some size. But the bulk of the smaller binary is language support (EH and CTOR/DTOR).

Compiling with #include <iostream> and -O3 -g0 produces a 7625 byte binary, if I compile that with -O0 -g3 it produces a 64Kb binary most of which is text describing symbols from the STL.

kfsone
  • 23,617
  • 2
  • 42
  • 74
  • 1
    It doesn't answer the question directly (or at all), but I couldn't ignore this great result, have an upvote. – LyingOnTheSky Jun 19 '15 at 11:51
  • See edits: I wasn't able to replicate your 26k binary; with `-O0 -g3` generates 6kb, with `#include ` and `-O3 -g0` generates 7.5Kb, with `#include` and `-O0 -g3` produces 64Kb. So I can't answer precisely what your overhead is, but I did address `What that EXE stores except the actual main and the functions it calls?` - that being, things it needs to use the standard libraries and support language features like exception handling, and lastly, and very leastly, it formats the binary as an elf binary rather than a raw executable. – kfsone Jun 19 '15 at 17:19
3

Your executable is including the C runtime, which knows how to do things like get the environment, setup the argv vector, and close all open files after calling exit() but before calling _exit().

tux3
  • 7,171
  • 6
  • 39
  • 51
Charlie Martin
  • 110,348
  • 25
  • 193
  • 263
3

There are many things which could affect the final file size during compilation, as other posters have pointed out.

Dissecting your specific example is more work than I'm willing to put in, but I know of a similar example from many years ago that should help you to understand the general problem, and guide you towards finding the specific answer you seek.

http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

This is done in C (rather than C++) using GCC, looking at the size of the ELF executable (not a Windows EXE), but as I said many of the same problems apply. In this case, the author looks at just return 42;

After you've read that document, consider that printing to stdout is considerably more complex than just returning a number. Also, since you are using C++ and cout <<, there's a lot of code hiding in there that you didn't write, and you can't really know how it's implemented without looking at that source.

Jim Wood
  • 891
  • 8
  • 20
3

people keep ignoring/forgetting that executables created in high level languages need engine to run properly. for example C++ engine is responsible for things like:

  1. heap/stack management

    • when you call new,delete you are not actually accessing OS functions
    • instead the engine use its own allocated heap memory
    • so engine has it own memory management that takes code/space
  2. local variables memory management

    • each time you call any function all the local variables must be allocated
    • and released before exiting it
  3. classes/templates

    • to handle these properly you need quite a lot of code

In addition to this you have to link all the stuff you use like:

  • RTL most executables nowdays MSVCPP and MSVB does not link them so we need to install huge amount of RTLs in system to make exe to even run. but still the linking to used DLL's must be present in executable (see DLL linking on runtime)
  • debug info
  • frameworks linkage (similar to RTL you need the code to bind to frameworks libs too)
  • for high level winows/forms IDE's you also have the window engine present
  • included libs and linked objs (iostream classes and operators even if you use just << you need much more of them to make it work...)

You can look at the C++ engine as a small operating system within operating system

  • in standalone MCU apps they are really the OS itself

Another space is occupied by the executable format (like PE), and also code aligns add some space

When you put all these together then the 26KB is not so insane anymore

Spektre
  • 49,595
  • 11
  • 110
  • 380
3

Compilers are not omnipotent.

std::cout is a stream object, with a set of data members for managing a buffer (allocating it, copying data to it and, when the stream is destroyed, releasing it).

The operator<< is implemented as an overloaded function which interprets its arguments and - when supplied a string - copies data to the buffer, with some logic that potentially flushes the buffer when it is full.

std::endl is actually an function which - in cooperation with all versions of a stream's operator<<() - affects data owned by the stream. Specifically, it inserts a newline into the streams buffer, and then flushes the buffer.

Flushing the stream's buffer calls other functions that copy data from the buffer to the standard output device (say, the screen).

All of the above is what the statement std::cout<<"Hello World"<<std::endl does.

In addition, as a C++ program, there is a certain amount of code that must be executed before main() is even called. This includes checking if the program was run with command line arguments, creating streams like std::cout, std::cerr, std::cin (there are others) ensuring those streams are connected with relevant devices (like the terminal, or pipes, or whatever). When main() returns, it is then necessary to release all the streams created (and flush their buffers), and things like that.

All of the above involves invoking other functionality. Creating a buffer for the stream means that buffer must be allocated and - after main() returns - released.

The specification of C++ streams also involves error checking. The allocation of std::cout's buffer might fail (e.g. if the host system doesn't have much free memory). The standard output device might be redirected to a file, which has limited capacity - so writing data to it might fail. All of those things must be checked for and handled gracefully.

All of this stuff will be in this 26K executable (unless that code is in runtime libraries).

In principle, the compiler can recognise that the program is not using its command line arguments (so not include code to manage command line arguments), is only writing to std::cout (so no need to create all the other streams before main() and release them after main() returns), is only using two overloaded versions of operator<<() and one stream manipulator (so the linker need not include code for all other member functions of the stream). It might also recognise that the statement writes data to the stream and immediately flushes the buffer - and thereby eliminate std::cout's buffer and all code that manages it. If the compiler can read the programmer's mind (few compilers can, in practice) it might work out that none of the buffers are actually needed, that the user will never run the program with standard output redirected, etc - and eliminate the code and data structures associated with all those things.

So, how would a compiler recognise that all those things aren't needed? Compilers are software, so they have to conduct some level of analysis on their inputs (e.g. source files). The analysis to eliminate all the code that a human might deem unnecessary is significant - so would take time. If the compiler doesn't do the analysis, potentially the linker might. Whether that analysis to eliminate unnecessary code is done by the compiler or linker is irrelevant - it takes time. Potentially significant time.

Programmers tend to be impatient. Very few programmers would tolerate a build process for a simple "hello world" program that took more than a few seconds (maybe they will tolerate a minute, but not much more).

That leaves compiler vendors with a decision. They can get their programmers to design and implement all sorts of analysis to eliminate unwanted code. That will add weeks - or, if they are working to a tight deadline, months - to implement, validate, verify, and ship a working compiler to customers (other developers). That compiler will be painfully slow at compiling code. Instead, vendors (and their developers) choose to implement less of that analysis in their compiler, so they can actually ship a working compiler to developers who will use it within a reasonable time. This compiler will produce an executable in a time that is somewhat tolerable for most programmers (say, under a minute for a "hello world" program). So what if the executable is larger? It will work. Hardware (e.g. drives) is relatively inexpensive and developer effort is relatively expensive.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • See edit. (SO won't allow 9 characters comment, so this.) – LyingOnTheSky Jun 17 '15 at 19:37
  • I'm not suggesting `iostream` is the culprit. I'm describing a bunch of contributors. The implementation of streams and their specification is one. Code that must be executed before and after the call of `main()` is another. – Peter Jun 18 '15 at 07:30
3

It's very old question. It have clear answer. The most problem is that one have to write many small pieces of information and make many small test which demonstrates different aspects of PE structures. I try to skip details and to describe the main parts of the problem based on Microsoft Visual Studio, which I know and use since many years. All other compilers do mostly the same, and I suppose that one need use just a little other options of compiler and linker.

First of all I suggest you to set breakpoint on the first line of the main, start debugging and to examine the Call Stack windows of the debugger. You will see something like

enter image description here

So the first thing, which is very important to understand, the main is not the first function which will be called in your program. The entry point of the program is mainCRTStartup, which calls __tmainCRTStartup, which calls main.

The CRT Startup code make many small things. One thing is very easy to understand: it uses GetCommandLineW Windows API to get the command line and parse the parameters, then it calls main with the parameters.

To reduce the size of the code there are two common approach:

  1. use CRT from DLL
  2. remove CRT from the EXE if it's not really used in the code.

It's very helpful if you start cmd.exe using "VS2013 x64 Native Tools Command Prompt" (or some close command prompt). Some additional paths will be set inside of the command prompt and you can use for example dumpbin.exe utility.

If you would use Multi-threaded DLL (/MD) compiler option then you will get 7K large exe file. "dumpbin /imports HelloWorld.exe" will show you that your program uses "MSVCR120.dll" together with "KERNEL32.dll".

Removing of CRT depends on the version of c/cpp compiler (the version of Visual Studio) which you use and even from the extension of the file: .c or .cpp. I understand your question as the common question for understanding the problem. So I suggest to start with the most simple case, rename .cpp file .c and the beginning and to modify the code to the following

#include <Windows.h>

int mainCRTStartup()
{
    return 0;
}

One can see now

C:\Oleg\StackOverflow\HelloWorld\Release>dir HelloWorld.exe
 Volume in drive C has no label.
 Volume Serial Number is 4CF9-FADF

 Directory of C:\Oleg\StackOverflow\HelloWorld\Release

21.06.2015  12:56             3.584 HelloWorld.exe
               1 File(s)          3.584 bytes
               0 Dir(s)  16.171.196.416 bytes free

C:\Oleg\StackOverflow\HelloWorld\Release>dumpbin HelloWorld.exe
Microsoft (R) COFF/PE Dumper Version 12.00.31101.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file HelloWorld.exe

File Type: EXECUTABLE IMAGE

  Summary

        1000 .data
        1000 .rdata
        1000 .reloc
        1000 .rsrc
        1000 .text

One can add the linker option /MERGE:.rdata=.text to reduce the size and to remove one section

C:\Oleg\StackOverflow\HelloWorld\Release>dir HelloWorld.exe
 Volume in drive C has no label.
 Volume Serial Number is 4CF9-FADF

 Directory of C:\Oleg\StackOverflow\HelloWorld\Release

21.06.2015  18:44             3.072 HelloWorld.exe
               1 File(s)          3.072 bytes
               0 Dir(s)  16.170.852.352 bytes free

C:\Oleg\StackOverflow\HelloWorld\Release>dumpbin HelloWorld.exe
Microsoft (R) COFF/PE Dumper Version 12.00.31101.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file HelloWorld.exe

File Type: EXECUTABLE IMAGE

  Summary

        1000 .data
        1000 .reloc
        1000 .rsrc
        1000 .text

To have "Hello World" program I suggest to modify the code to

#include <Windows.h>

int mainCRTStartup()
{
    LPCTSTR pszString = TEXT("Hello world");
    DWORD cbWritten;
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), pszString, lstrlen(pszString), &cbWritten, NULL);
    return 0;
}

One can easy verify that the code work and it's still small.

To remove CRT from .cpp file I suggest to follow the following steps. First of all we would use the following HelloWorld.cpp code

#include <Windows.h>
int mainCRTStartup()
{
    LPCTSTR pszString = TEXT("Hello world");
    DWORD cbWritten;
    WriteConsole(GetStdHandle(STD_OUTPUT_HANDLE), pszString, lstrlen(pszString), &cbWritten, NULL);
    return 0;
}

It's important that one verify some compiler and linker options and set/remove someone. I included the settings on the pictures below:

enter image description here

enter image description here

enter image description here

enter image description here

The last screen shows that we remove binding to default libraries which we don't need. The compiler uses directive like #pragma comment(lib, "some.lib") to inject usage of some libraries. By usage the options /NODEFAULTLIB we remove such libs and the exe will be compiled like we need.

One will see that the resulting HelloWorld.exe have only 3K (3.072 bytes) and there are exist dependency to one KERNEL32.dll only:

C:\Oleg\StackOverflow\HelloWorld\Release>dumpbin /imports HelloWorld.exe
Microsoft (R) COFF/PE Dumper Version 12.00.31101.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file HelloWorld.exe

File Type: EXECUTABLE IMAGE

  Section contains the following imports:

    KERNEL32.dll
                402000 Import Address Table
                402038 Import Name Table
                     0 time date stamp
                     0 Index of first forwarder reference

                  60B lstrlenW
                  5E0 WriteConsoleW
                  2C0 GetStdHandle

  Summary

        1000 .idata
        1000 .reloc
        1000 .rsrc
        1000 .text

One can download the corresponding Visual Studio 2013 demo project from here. One need switch from default "Debug" compiling to "Release" and rebuild solution. One will have working HelloWorld.exe which length is 3K.

Oleg
  • 220,925
  • 34
  • 403
  • 798
1

This does show how hard it can be to write a program with identical semantics.

<<std::endl will flush a stream if that stream is good(). That means the whole error handling code of ostream must be present.

Also, std::cout could have its streambuf swapped out from under it. The compiler cannot know it's actually going to STDOUT_FILENO. It has to use the whole streambuf intermediate layer.

MSalters
  • 173,980
  • 10
  • 155
  • 350