15

A couple of years ago I asked a question how to reduce size of executables. Using MinGW compiler, stripping symbols (-s option) helped to reduce 50%+ of the size.

Why the stripping is not default - is there any good reason why NOT to strip the symbols in some scenarios then? I'd like to understand it more deeply: today, I just vaguely know that symbols are involved in linking library. Are they needed in executable and do they affect executing speed?

Community
  • 1
  • 1
Jan Turoň
  • 31,451
  • 23
  • 125
  • 169

3 Answers3

16

I can't imagine them affecting the execution speed in any noticeable way though, in theory, you could have a tiny amount of cache misses in the process image.

You want to keep symbols in the file when you're debugging it, so that you can see which function you're in, check the values of variables and so forth.

But symbols make the file bigger: potentially, a lot bigger. So, you do not want symbols in a binary that you put on a small embedded device. (I'm talking about a real embedded device, not some 21st century Raspberry Pi with 9,000GB of storage space!)

I generally strip all release builds and don't strip any debug builds. This makes core dumps from customers slightly less useful, but you can always keep a copy of the unstripped release build and debug your core against that.

I did hear about one firm that had a policy of never stripping symbols even in release builds, so they could load a core directly into the debugger. This seems like a bit of an abstraction leak to me but, whatever. Their call.

Of course, if you really want, you can analyse the assembly yourself without them. But that's crazy...

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    Apparently, you never had to run pstack on running executable. – SergeyA Aug 25 '15 at 19:01
  • 1
    Exactly what I've said. If you ever had to do this, you'd not strip. – SergeyA Aug 25 '15 at 19:03
  • 1
    And you do not seem to know the difference between debug symbols and the symbols. – SergeyA Aug 25 '15 at 19:06
  • 1
    @SergeyA: So you are expecting your customers to debug your code? Hmm, ... I once worked for such a company ... – too honest for this site Aug 25 '15 at 19:23
  • 4
    @Olaf, there are very different environments out there. For instance, there are cases when developers access their custmers environment and troubleshoot there. ;) – SergeyA Aug 25 '15 at 19:30
  • 2
    With non-trivial systems, especially if distributed/networked, it is almost inevitable that bugs will surface in a customers configuration/environment that have not been seen during alpha testing. When that happens, you will need all the help you can get from your debugger and logger. I often keep a debug build, (with ALL the symbols, range-checking etc on, optimization off etc), handy for swapping in. – Martin James Aug 26 '15 at 00:40
  • The point of not stripping symbols in relation to an abstraction leak is pertinent. Moreover it prevents implementation leaks. DWARF symbol info also contains a LOT more nuggets of information on your implementation. So if you want to make it a bit harder for someone to rummage around in some proprietary algorithms, then stripping symbols from things shipped externally is sound practice. (But keep the optimised binaries with debug symbols around internally for debugging, or hive off the symbols in a separate as per https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html). – Rob May 16 '19 at 08:42
  • In terms of size of binaries, this also comes up when writing plug-ins against large applications such as After Effects or Final Cut Pro. They have many dependencies, and your plug-in might have a similarly large set of dependencies (but may only in practice use a handful of functions from each). To prevent symbol clashes, static linking and symbol stripping is a common technique (as are symbol versioning/namespacing and other tricks), which can make binaries _very_ large if not stripped. (E.g. boost is big, OpenEXR is big, etc). – Rob May 16 '19 at 08:47
11

MinGW is an acronym for "Minimalist GNU for Windows"; as such, the compiler suite is GCC ... the GNU Compiler Collection. Naturally, this compiler suite will tend to comply with the GNU Coding Standards, which demand that every build shall be a "debug build" ... i.e. it shall include both debugging symbols, and those required for linking. (This is logical, since a "debug build" is orders of magnitude more useful to the application developer, than is a so-called "release build"). Furthermore, a build system which discriminates between "debug build" and "release build" modes is more complex -- possibly significantly more so -- than one which concerns itself with only a "debug build".

In the GNU build model, there is no need for a "release build" anyway. A "release" is not created at build time; it is created at installation time -- usually as a "staged" installation, from which a release package is created. The GNU tool-chain includes both a strip command, and an install command, which can strip a "debug build" on the fly, when creating a staged installation for packaging as a release, (or in place, if you wish), so there really is no need to clutter the build system with "release build" specifics; just create a "debug build" up -front, then strip it after the event, so converting this to an effective "release build", when you require it.

Since your MinGW tool-chain is fundamentally a GNU tool-chain, it fully supports this GNU build model.

Keith Marshall
  • 1,980
  • 15
  • 19
  • It's possible to have a separate debug and release builds with GCC in general and also with GCC+MinGW. For example, in the debug build one can run `gcc -g ...`, and in the release `gcc -s -O2`. – pts Feb 03 '17 at 13:33
  • @pts: sure you _can_ do that, but it's completely atypical of the GNU build model. Pragmatically, what advantage does `gcc -s ...` offer over building with `gcc -g ...`, and stripping afterwards? Furthermore, if you are cluttering your build system with logic to _discriminate_ between debug and release builds, then you are making it unnecessarily more complex, (and therefore more error prone), than it needs to be; plainly, that's just ... asinine. – Keith Marshall Feb 03 '17 at 22:54
  • They are equivalent, there is no practical advantage. But I wasn't proposing these two. I was proposing `gcc -g ...` vs `gcc -s -O2`. – pts Feb 03 '17 at 22:56
  • 1
    @pts: `-g` and `-O2` are _not_ mutually exclusive. A default GNU build uses `gcc -g -O2 ...` anyway. That's not, perhaps the most useful for extensive debugging, but it does preserve the ability to generate a backtrace, in the event of a crash; you lose that with `gcc -s ...`, regardless of any optimization level you've specified in addition. – Keith Marshall Feb 03 '17 at 23:06
10

There is no good reason to strip symbols. By stripping symbols, for instance, you will eliminate a possibility to gather process stack - an extremely usuful feature.

EDIT. Several people here seem to be confused about the difference between debug symbols and 'general' symbols. The first are only produced when -g (or similar) option is given to the compiler, and are usually used only in debug builds (though can be used with optimized builds as well). The regular symbols are things such as function names, and are produced by compilers so that object files could be linked, or .so files loaded. There are several extremely useful non-invasive tools which can be used on non-debug running executables which depend on symbols being non-stripped.

SergeyA
  • 61,605
  • 5
  • 78
  • 137
  • 6
    Sure there is. It makes the executable smaller. You might not be able to fit the executable on the target platform with the debug symbols in it. – NathanOliver Aug 25 '15 at 19:03
  • You mean to debug? Could you please give an example where I can see the usefullness? I have very little experience here. – Jan Turoň Aug 25 '15 at 19:03
  • 2
    There is a difference between debug symbols and symbols. Do not confuse them. – SergeyA Aug 25 '15 at 19:03
  • 4
    This is an unbalance answer with no arguments, just a single militant assertion. – Lightness Races in Orbit Aug 25 '15 at 19:05
  • 2
    There are commands to get a stack trace of the running process. pstack on Solaris/Linux, procstack on AIX. I believe, any modern Unix has a flavor of them. And they come extremely handy when you need to troubleshoot the issue with running process, together with truss/proctruss/d-truss/... They do not work without symbols. – SergeyA Aug 25 '15 at 19:05
  • @SergeyA could you explain (or post a link) what is truss and how does it work with symbols? – Jan Turoň Aug 25 '15 at 19:11
  • 1
    truss (there are different names for this utility, truss is the one used in Solaris world, in Linux is called strace) is a utility which allows you to attach to a running process (or start a new one) and show function calls made in the application. If the function is known to the truss (i.e. it is a system call), it will even tell what the arguments mean. But it can also trace non-system calls, displaying arguments and return values. As for the link, try http://docs.oracle.com/cd/E23823_01/html/816-5165/truss-1.html – SergeyA Aug 25 '15 at 19:19
  • 8
    A good reason may be to hamper reverse-engineering, which may be desired especially in commercial products, because of company's policy. – Grzegorz Szpetkowski Aug 25 '15 at 21:23
  • How is this useful when you process has 19 stacks? – Martin James Aug 26 '15 at 00:43
  • @MartinJames, what do you mean by 19 stacks? 19 threads? truss will display all threads calls and provide thread id of each caller. – SergeyA Aug 26 '15 at 14:08