1

I'm trying to make compilations of the GHC Haskell compiler 100% reproducible (byte-identical).

The object files are already byte-identical, but the final linked binary isn't.

GHC delegates the final linking to gcc, like:

/usr/bin/gcc -fno-stack-protector -DTABLES_NEXT_TO_CODE -o Main Main.o [..some more files..] /tmp/ghc21220_0/ghc21220_5.o /tmp/ghc21220_0/ghc21220_7.o [...] '-Wl,--hash-size=31' -Wl,--reduce-memory-overheads

Interestingly, the file name of the temporary file ghc21220_7.o appears in the linked binary.

It seems that I am able to remove it with the strip tool.

Why does the file name appear there, what is its purpose?

Is there a flag to tell gcc (or maybe ld?) to not include these file names?


Update: If I run objdump --syms on the binary, I see

0000000000000000 l    df *ABS*  0000000000000000              ghc21220_5.c
0000000000000000 l    df *ABS*  0000000000000000              ghc21220_7.c

According to this d means debug and f means file. My question remains: Why and how exactly do the file names the .c files make it into the final binary, and can I suppress this at compile time (as opposed to running strip later)?

Community
  • 1
  • 1
nh2
  • 24,526
  • 11
  • 79
  • 128
  • 1) having any expectation for a toolchain to generate a 100 byte identical binary over and over again is not realistic, often if nothing else a timestamp is included. Yes if you make a .bin or .hex or some format that does not support this other stuff sure in theory that is identical so long as it is 100% your code and no libraries. 2) that stuff is there for folks that use debuggers who dont like reading assembler and whine if that is what they get 3) just use strip. Or a file format that has no room for anything but code and data, no metadata. – old_timer Aug 23 '14 at 01:07
  • If possible I imagine the only real question here that is stackoverflow related is how to not have debug stuff added in the first place rather than having to strip it out later. – old_timer Aug 23 '14 at 01:07

1 Answers1

3

The source file names appear as symbols in the executable because the first thing GCC does when emitting assembly is to write a .file directive to the output. The assembler then turns it into a symbol in the object file which linker puts in the executable along with all the other symbols. I'm not sure if it serves a useful purpose, but it might allow the linker to give a source file name rather than object file name in errors.

Short of modifying the code there's nothing you can do to stop GCC from generating the .file directive or stop the assembler from converting them to symbols in the object files. You can tell the linker to not include them in the executable by using the -x option which tells it to strip all local symbols.

Another more targeted option would be to use the strip command to strip only the filename symbol from the object file:

strip -N ghc21220_5.c ghc21220_5.o

Finally you could choose to give your C source files the same name when they're supposed to be identical. Ultimately your choice of file names is the source of the differences you're seeing in the executables.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
  • Thanks for the answer. I've ananysed the contents of the C files in question and they indeed can be given a usual (non-generated) name. – nh2 Aug 27 '14 at 18:55