When I disassemble an .exe file into intermediate language, why do I get a dump that is smaller than the executable? Is it because statically linked code is not included?
Asked
Active
Viewed 81 times
0
-
It might be because the .exe file also contains [resources](http://stackoverflow.com/q/90697/501250) that you haven't dumped. This will be case-by-case -- not all .exe files will be larger than their dumped IL. – cdhowie Feb 17 '17 at 20:34
-
@cdhowie, for empty class with empty Main function difference is 2kb. – Volodymyr Boiko Feb 17 '17 at 20:53
-
no, i realize what kind of information can be included in exe, but 2kb. what i also missed, that the code about references visibility/counting. it, obviously, must exist somewhere in each function. – Volodymyr Boiko Feb 17 '17 at 21:02
-
The CLR does not use reference counting, it uses a generational mark-and-sweep GC. Note that CIL .exe files also include a bootstrap entry point to load the CLR's .dll entry point and invoke it against the .exe (since CIL is not natively executable). Obviously this code is not IL, and it doesn't make sense to dump this boilerplate since it's going to be similar (if not identical) for all CIL executables. This could also account for some of the difference. – cdhowie Feb 17 '17 at 22:52
-
1@cdhowie. i thought root set must be updated somehow at function's exit. can you recommend me something to read about how actually root set is updated.? – Volodymyr Boiko Feb 18 '17 at 00:21
-
1I don't believe the root set needs to be updated; the stack roots can simply be scanned off of the current call stack of every running thread. https://msdn.microsoft.com/en-us/library/ee787088(v=vs.110).aspx – cdhowie Feb 18 '17 at 01:15
-
@GreenTree - the only live root-ish tracking that goes on while GC is not happening is for the Card Table, which is updated every time an object reference is written into a field of a class using what's called a write barrier. But it tracks pages that might have roots in them rather than roots themselves. Card tables are necessary for generational GC's that allow partial collections; the alternative to them is to use the virtual memory protection features of the operating system (which does not perform as well as staying in userspace). – hoodaticus Jun 16 '17 at 12:59
1 Answers
3
EXE files do not just contain binary program code. It can also contain embedded resources, statically linked libraries, strings, a header, metadata information, symbol table/debugging information and lots of other stuff.
Also, keep in mind, the compile goes through an optimization phase that might transform your code in order to optimize it. The disassembled code might not be a 1 to 1 match with your original source code.

Icemanind
- 47,519
- 50
- 171
- 296
-
I'm not sure that IL gets optimized in ways that increase code size prior to jit. It seems the focus of optimizations in the C# compiler, for instance, was on reduction of code size (which itself includes many optimizations like variable elimination and such). Though Lippert could expand on this more fully and probably has. – hoodaticus Jun 15 '17 at 18:36
-
1I might be confusing this with the C++ compiler, but I think there used to be switches, [-O1 and -O2](https://msdn.microsoft.com/en-us/library/8f8h5cxt.aspx) that would optimize to either minimize the size of the executable or maximize the speed of the executable. – Icemanind Jun 15 '17 at 20:50
-