4

For example: I created a simple C program that prints "Hello, World", compiled it and it created an executable that had a size of 39.8Kb.

following this question I was able to create the equivalent but written in Assembly the size of this program was 39.6Kb.

This surprised me greatly as I expected the assembly program to be smaller than the C program. As the question indicated it uses a C header and the gcc compiler. Would this make the assembly program bigger or is it normal for them to be both roughly the same size?


Using the strip command I reduced both files. This removed debug code and now both have very similar file sizes. Both 18.5Kb.

test.c:

Xantium
  • 11,201
  • 10
  • 62
  • 89
  • 2
    Why would you expect them to be much different? They're doing the same thing. – Barmar Jan 29 '18 at 21:16
  • 1
    @Barmar I was led to believe people (sometimes) used assembler because it was lower level, faster and also produced smaller outputs but please tell me if I am wrong. – Xantium Jan 29 '18 at 21:21
  • 2
    Compilers are pretty good at generating optimal code. – Barmar Jan 29 '18 at 21:22
  • 4
    For a very small program, the size is dominated by overhead and any linked libraries. If you used a C compiler to compile the assembler then those might be identical. – Mark Ransom Jan 29 '18 at 21:22
  • 1
    No, you were not wrong. Have a look [at this MASMForum answer](http://masm32.com/board/index.php?PHPSESSID=dda50dd70f164f06a73355b1ec02f167&topic=1301.msg12900#msg12900). – zx485 Jan 29 '18 at 21:23
  • 1
    The codes cannot possibly use c. 40K to print "Hello World". There is a large overhead. – Weather Vane Jan 29 '18 at 21:23
  • @WeatherVane Well that is the size it produced on my system. – Xantium Jan 29 '18 at 21:24
  • 1
    That is what I said. The tiny code from each code generator needs a big blanket. – Weather Vane Jan 29 '18 at 21:25
  • @MarkRansom So it's down to the fact I used the C compiler. OK thanks. – Xantium Jan 29 '18 at 21:27
  • @WeatherVane I think I get it. Thank you. – Xantium Jan 29 '18 at 21:28
  • @zx485 You shouldn't post links with PHPSESSID in them because then I might be logged in as you. (In this case I don't seem to be) – user253751 Jan 29 '18 at 21:36
  • @immibis: Thx. I didn't take care of that. But I wasn't logged it anyway. – zx485 Jan 29 '18 at 21:39
  • C code compiles down into Assembly. If you really know what you are doing, there are situations where hand-optimizing the assembly can produce better results than a compiler. – Christian Gibbons Jan 29 '18 at 21:54
  • 2
    You rewrote just tiny part of that app in assembly (just calling `printf` and `exit`) and leave the implementation of 95% of the code to the C runtime library (you probably quite underestimate the amount of work done "under"). Which is then same for both your asm version, and C version, so no wonder you end with roughly same executable. The minimal windows PE executable is said to be 133 bytes, I didn't check if it still has enough space in the DOS header area for quick and dirty hello world output, probably not, but let's say 200B may be enough. Remaining 39kB are convenience and C runtime. – Ped7g Jan 29 '18 at 22:19
  • [tiny bit of C plus massive library and system call] ~= [tiny bit of assembler plus massive library and system call] – Martin James Jan 29 '18 at 22:20
  • @Ped7g So I just redid in assembly using a C header what my original C program converted into assembly in the first place. Not surprising there is little change in file sizes – Xantium Jan 29 '18 at 22:25
  • @MartinJames So all that is a massive library which remains the same in both programs. Got it. – Xantium Jan 29 '18 at 22:28
  • What compile are you using? – jwdonahue Jan 29 '18 at 22:36
  • @jwdonahue gcc on Windows – Xantium Jan 29 '18 at 22:39
  • Could someone tell me why the downvote. Perhaps I could make my post better if they told me? – Xantium Jan 29 '18 at 22:40
  • 1
    i write this kind of program on *c++* and it size 2560 bytes and so what ? simply i not use static crt libs and c++ runtime which give this big size – RbMm Jan 30 '18 at 00:05
  • 9.5Kb - really *HUGE* size for this hello word. must not be – RbMm Jan 30 '18 at 00:41
  • @RbMm I can edit to include all the code and compilation if you want. – Xantium Jan 30 '18 at 00:42
  • @Simon - task not in compiler. task - which lib file you use ? which linker option? what is entry point of exe ? show it – RbMm Jan 30 '18 at 00:44
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/164135/discussion-between-simon-and-rbmm). – Xantium Jan 30 '18 at 00:44

3 Answers3

6

If your hand written code is on par with a compiled function, then sure they are going to be similar in size, they are doing the same thing and if you can compete with a compiler you will be the same or similar.

Now your file sizes indicate you are looking at the wrong thing all together. The file you are looking at while called a binary has a ton of other stuff in it. You want to compare apples to apples in this context then compare the size of the functions, the machine code, not the size of the container that holds the functions plus debug info plus strings plus a number of other things.

Your experiment is flawed but the results very loosely indicate the expected result. But that is if you are producing code in the same way. The odds of that are slim so saying that no you shouldnt expect similar results unless you are producing code in the same way.

take this simple function

unsigned int fun ( unsigned int a, unsigned int b)
{
    return(a+b+1);
}

the same compiler produced this:

00000000 <fun>:
   0:   e52db004    push    {r11}       ; (str r11, [sp, #-4]!)
   4:   e28db000    add r11, sp, #0
   8:   e24dd00c    sub sp, sp, #12
   c:   e50b0008    str r0, [r11, #-8]
  10:   e50b100c    str r1, [r11, #-12]
  14:   e51b2008    ldr r2, [r11, #-8]
  18:   e51b300c    ldr r3, [r11, #-12]
  1c:   e0823003    add r3, r2, r3
  20:   e2833001    add r3, r3, #1
  24:   e1a00003    mov r0, r3
  28:   e28bd000    add sp, r11, #0
  2c:   e49db004    pop {r11}       ; (ldr r11, [sp], #4)
  30:   e12fff1e    bx  lr

and this

00000000 <fun>:
   0:   e2811001    add r1, r1, #1
   4:   e0810000    add r0, r1, r0
   8:   e12fff1e    bx  lr

because of different settings. 13 instructions vs 3, over 4 times larger.

A human might generate this directly from the C, nothing fancy

add r0,r0,r1
add r0,r0,#1
bx lr

not sure from order of operations if you technically have to add the one to b before adding that sum to a. Or if it doesnt matter. I went left to right the compiler went right to left.

so you could say that the compiler and my assembly produced the same number of bytes of binary, or you could say that the compiler produced something over 4 times larger.

Take the above and expand that into a real program that does useful things.

Exercise to the reader (the OP, please dont spoil it) to figure out why the compiler can produce two different correct solutions that are so different in size.

EDIT

.exe, elf and other "binary" formats as mentioned can contain debug information, ascii strings that contain names of functions/labels that make for pretty debug screens. Which are part of the "binary" in that they are part of the baggage but are not machine code nor data used when executing that program, at least not the stuff I am mentioning. You can without changing the machine code nor data the program needs, manipulate the size of your .exe or other file format using compiler settings, so the same compiler-assembler-linker or assembler-linker path can make the binary file in some senses of that word larger or smaller by including or not this additional baggage. So that is part of understanding file sizes and why perhaps even if your hello world programs were different sizes, the overall file might be around the same size, if one is 10 bytes longer but the .exe is 40K then that 10 bytes is in the noise. But if I understand your question, that 10 bytes is what you are interested in knowing how it compares between compiled and hand written C.

Also note that compilers are made by humans, so the output they produce is on par with what at least those humans can produce, other humans can do better, many do worse depending on your definition of better and worse.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    this big size absolute not related to compiler at all. it related to how crt is linked to program. if we use static linkage - big part of crt code is linked to program. if use crt in dll (msvcrt.dll) and c-runtime - the size even of c++ code will near 2500 bytes – RbMm Jan 30 '18 at 00:09
  • static vs dynamic is part of it but that is independent of compiler vs hand written asm. you can use those linker features to make either larger or smaller, along with debug information, and other baggage. – old_timer Jan 30 '18 at 00:12
  • crt is usually very small relative to the rest of the program. depends on the program of course. but like static vs dynamic does not apply to this question as a main() and a main: can have the same baggage. they are not part of the difference between compiling to asm vs hand written asm. – old_timer Jan 30 '18 at 00:15
  • except static vs dynamic use crt, exist also c runtime - piece of code which always static linked to your binary even if you use dynamic crt linkage. i for example just build `#include #include void ep(void*) { ExitProcess(printf("Hello, World")); }` and it exe size was 2560 bytes – RbMm Jan 30 '18 at 00:15
  • compiled code, with most mainstream tools goes through the assembler before the linker. so it is obvious that compiled vs hand written can have the same baggage and the baggage is not part of the question as I read it. It is the whole question if you ignore the compiled vs hand written part and compare why one .exe is of a different size from another. Please post your answer so the OP has the opportunity to change which one is selected. baggage or compiled vs hand written. – old_timer Jan 30 '18 at 00:18
  • again - all this absolute not related to compiler and linker. optimization, etc, can add say +/- 1000 bytes. but all this almost 40kb size - this is c runtime code statically linked with exe file. the OP not show exactly program code - which libs he use for linker and how it write asm version. anyway even c++ code can be very small (2.5k) for this hello word – RbMm Jan 30 '18 at 00:22
  • Please post your alternate approach to this answer and let the OP decide which one applies to the question. Also read the title of the question as well as the question itself, the OP clearly didnt know what was in a "binary" so is the question about what is in a binary or is it the title, compiled vs hand written asm. Please post your answer so we can all see it. – old_timer Jan 30 '18 at 00:24
5

the size 39+ Kb absolute not related to compiler and language used (c/c++ or asm) different optimizations, debug information, etc - can change size of this tinny code on say 1000 bytes. but not more. i for test build next program

#include <Windows.h>
#include <stdio.h>
void ep(void*)
{
    ExitProcess(printf("Hello, World"));
}

linker options:

/INCREMENTAL:NO /NOLOGO /MANIFEST:NO /NODEFAULTLIB 
/SUBSYSTEM:CONSOLE /OPT:REF /OPT:ICF /LTCG /ENTRY:"ep" /MACHINE:X64 kernel32.lib msvcrt.lib

and got size 2560 bytes exe for both x86/x64.

in what different ? in /NODEFAULTLIB and my version of msvcrt.lib - which is pure import library.

the rest 35kb+ size you give by used static linked c runtime. even if you write program on asm - you need use some lib for link to printf. and your lib containing some code which is static linked with your code. in this code this 35kb.

task is not c++ vs asm - no different here. task in use c-runtime or not use

RbMm
  • 31,280
  • 3
  • 35
  • 56
4

I agree with old_time but I also did a quick test for ground truth. With VS-2017 Pro, I get similar results (~37KB) on the size of the executable, but only if I look in the debug output folder. After building for release, it's closer to ~9KB. Much of that difference is in the size of the static libraries needed to call into the OS/C-runtime DLL's.

EDIT: Despite the fact that most modern C compilers can match or out-perform most hand written assembly code, the hand written variety can be smaller by virtue of the fact that it doesn't have to have all that C run-time over-head, but the difference is rarely enough to warrant the extra development and maintenance costs of assembler code, particularly for non-trivial applications. There's a reason that most of the modern OS kernels are written predominantly in C or other high-level languages with only pin-hole assembler optimizations in a handful of critical functions.

Trivial "hello world" class programs are not a good comparison for C vs assembler. There's just not enough opportunities for the compiler or the human to do much in the way of optimization. Write a math or data processing library and application and compare those. I'd be willing to bet the compiler will kick your but.

jwdonahue
  • 6,199
  • 2
  • 21
  • 43
  • I'm not running VS. I'm running MinGW (gcc) but thank you anyway. – Xantium Jan 29 '18 at 22:42
  • @Simon, All tool chains have similar requirements. Debug code is bigger than release code and for the most part, there's not a lot of difference between compilers targeting the same OS as they have to link to substantially the same libraries. – jwdonahue Jan 29 '18 at 22:44
  • Yes you are right. I just found out about how to decrease to 9.5Kb using the `strip` command. – Xantium Jan 30 '18 at 00:29
  • @Simon, And even what's left at that point is still mostly those static libraries you have to have to access OS API's. – jwdonahue Jan 30 '18 at 00:34
  • With assembler, you can do a *lot* of stuff to reduce executable size for trivial programs if you really work at it. See for example http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html: A Linux ELF executable that just exits can be packed into 42 bytes (with the machine code inside the ELF program header, as values for some fields that don't matter)! On Windows where it's not portable / supported to use system calls directly, you can't avoid dynamiclly linking some DLLs, so you can't just make tiny static executables that make system calls directly (except as a hack). – Peter Cordes Jan 30 '18 at 00:52
  • @PeterCordes, sure but it's a totally contrived trivial example of little practical use. I've done essentially the same thing with C targeting embedded systems. The smallest practical program I ever wrote was actually the loader for the virtual memory manager on a PPC-823 chip. It was just a few hundred bytes long, and most of that was the binary table to load into the chip. For that exercise, there was no C run-time. – jwdonahue Jan 30 '18 at 01:03
  • @jwdonahue: right; executable size optimization at that level is not usually worth it for code that can only run under a large OS like Linux or Windows. But you *can* make small static executables with asm more easily than C. As you say, not usually worth it in real life except for bootloaders or microcontrollers, though. – Peter Cordes Jan 30 '18 at 02:19
  • I have to admit that I am old enough to remember a time when even I could produce smaller/faster machine code than any of the compilers available to me, particularly on embedded systems. Today however, even the later are mostly commodity devices with very mature tool chains and I no longer maintain libraries of assembler macros and subroutines for each of the micro's I have worked on. Except for a few lines of compiler intrinsics in the Windows kernel, I doubt there's a single surviving product out there with any of my assembler code running in it. – jwdonahue Jan 30 '18 at 20:16