2

I understand that symbol tables are created by the compiler to help with its process. They exist per object file for when they are being linked together.

Assume:

void test(void){
 //
}
void main(){
  return 0;
}

compiling above with gcc and running nm a.out shows:

0000000100000fa0 T _main
0000000100000f90 T _test

Why are these symbols still needed? why doesn't the linker remove them once done? aren't they potentially a security risk for hackers to read the source?

Edit

Is this what you mean by debugging a release binary (the ones compiled without -g)?

Assume:

int test2(){
 int *p = (int*) 0x123;
 return *p;
}

int test1(){
 return test2();  
}

int main(){
 return test1();
}

which segfaults on test2. doing gdb ./a.out > where shows:

(gdb) where
#0  0x000055555555460a in test2 ()
#1  0x000055555555461c in test1 ()
#2  0x000055555555462c in main ()

But stripping a.out and doing the same shows:

(gdb) where
#0  0x000055555555460a in ?? ()
#1  0x000055555555461c in ?? ()
#2  0x000055555555462c in ?? ()

Is this what you mean by keeping symbol tables for debugging release builds? is this the normal way of doing it? are there other tools used?

Community
  • 1
  • 1
  • 2
    You have clearly never used a debugger. – Scott Hunter Mar 19 '20 at 02:12
  • 2
    probably used it more than you. `gcc -g` is used for gdb. gcc without `-g` still has symbol tables. –  Mar 19 '20 at 02:13
  • 3
    `-g` adds more debugging info into the image. But that's extra to what the symbols provide. Without symbols you would not be able to put a break point on the `test` function unless you knew the load address. Try it: run `strip` on the executable and then try to put a break point on any funciton. – kaylum Mar 19 '20 at 02:21
  • https://stackoverflow.com/q/32211913/10678955 – root Mar 19 '20 at 02:32
  • @kaylum there is no reason to debug a release build, if I wanted to debug it I would use -g –  Mar 19 '20 at 02:44
  • 1
    AFAIK, external symbols to library functions that your code calls are still needed at run-time so that the dynamic loader can find the right symbols in the shared libraries. – Jonathan Leffler Mar 19 '20 at 02:48
  • @JonathanLeffler that makes sense. Thank you. –  Mar 19 '20 at 02:53
  • 5
    @Josh: Sure, there is never a reason to debug a release build because they never have bugs and always behave exactly the same as a debug build. – Eric Postpischil Mar 19 '20 at 03:13
  • You can add `-s` to the GCC command to strip away all unnecessary symbols. It is working as calling `strip` afterwards. – the busybee Mar 19 '20 at 06:34
  • 2
    @Josh Your ship release builds to your customers, right? Then if the program you shipped crashes and your customer has a core dump, how do you make sense of it without symbols? You don't have to ship the symbols just because the linker gave them to you. – David Schwartz Mar 19 '20 at 15:19

1 Answers1

1

Why are these symbols still needed?

They are not needed for correctness of execution, but they are helpful for debugging.

Some programs can record their own stack trace (e.g. TCMalloc performs allocation sampling), and report it on crash (or other kind of errors).

While all such stack traces could be symbolized off-line (given a binary which did contain symbols), it is often much more convenient for the program to produce symbolized stack trace, so you don't need to find a matching binary.

Consider a case where you have 1000s of different applications running in the cloud at multiple versions, and you get 100 reports of a crash. Are they the same crash, or are there different causes?

If all you have are bunches of hex numbers, it's hard to tell. You'd have to find a matching binary for each instance, symbolize it, and compare to all the other ones (automation could help here).

But if you have the stack traces in symbolized form, it's pretty easy to tell at a glance.

This does come with a little bit of cost: your binaries are perhaps 1% larger than they have to be.

why doesn't the linker remove them once done?

You have to remember traditional UNIX roots. In the environment in which UNIX was developed everybody had access to the source for all UNIX utilities (including ld), and debuggability was way more important than keeping things secret. So I am not at all surprised that this default (keep symbols) was chosen.

Compare the to choice made by Microsoft -- keep everything to .DBG (later .PDB files).

aren't they potentially a security risk for hackers to read the source?

They are helpful in reverse engineering, yes. They don't contain the source, so unless the source is already open, they don't add that much.

Still, if your program contains something like CheckLicense(), this helps hackers to concentrate their efforts on bypassing your license checks.

Which is why commercial binaries are often shipped fully-stripped.

Update:

Is this what you mean by keeping symbol tables for debugging release builds?

Yes.

is this the normal way of doing it?

It's one way of doing it.

are there other tools used?

Yes: see best practice below.

P.S. The best practice is to build your binaries with full debug info:

gcc -c -g -O2 foo.c bar.c
gcc -g -o app.dbg foo.o bar.o ...

Then keep the full debug binary app.dbg for when you need to debug crashes, but ship a fully-stripped version app to your customers:

strip app.dbg -o app

P.P.S.

gcc -g is used for gdb. gcc without -g still has symbol tables.

Sooner or later you will find out that you must perform debugging on a binary that is built without -g (such as when the binary built without -g crashes, but one built with -g does not).

When that moment comes, your job will be much easier if the binary still has symbol table.

Employed Russian
  • 199,314
  • 34
  • 295
  • 362
  • Thanks alot for this info. I have added a question under `Edit` above to clear something. is it true? is this the normal way of doing it? are there other tools used? I know pstack is only used with 32 bits, which is now outdated. –  Mar 19 '20 at 16:07
  • Thank you, one last question, is it possible to analyze a core dump without the original executable? I know it's not possible with gdb. are there other tools? –  Mar 19 '20 at 16:43
  • @Josh This depends on the exact analysis you want to perform. *Some* analysis is possible without the original executable, with GDB, `readelf`, `objdump`, etc. This really deserves a separate question. – Employed Russian Mar 19 '20 at 17:38