2

If you clicked on this because you thought that this can't be possible, I thought the same thing until I ran into it.

I was working on a project, written in C for a PIC, that is built with a Makefile. The Makefile was very disorganized, so I wanted to clean it up. To make sure I didn't break anything while I did it, I recorded the hashes of all the files following a fresh make: (No subdirectories in this project. Built with SDCC and GPUTILS.)

make clean
make
md5sum ./* > ../allsums.txt

Then I modified the Makefile and tried again, this time comparing the resulting files to allsums.txt.

make clean
vim Makefile
make
md5sum -c ../allsums.txt

Interestingly, the hashes of the .o files did not match, but the end result did. Assuming that the problem to be one I created somehow, I spent a lot of time trying to hunt it down.

Then, on a hunch, I did this using the original Makefile:

make clean
make
md5sum ./* > ../allsums.txt
make clean
make
md5sum -c ../allsums.txt

I found that the object files changed here, too! Some searching lead me to this question, which confirmed that (at least for gcc) the .o files change between each compilation.

What causes this?

Community
  • 1
  • 1
EchoLynx
  • 410
  • 5
  • 11
  • No, `gcc` will create the same file (i.e. match), unless the build changes something. If the source files have something like (`char build_time[] = DATE`) where there was a compile argument `-DDATE="\`date\`"`, that could change things. – Craig Estey Sep 23 '16 at 19:07
  • A better way to understand/resolve this is to get a hex dump of each file (with `od` or `xxd`) and `diff` them to see which parts change. You may then be able to work backwards to see which offset within a given `.o` section has the change. And, then, relate it to a specific symbol. Then, look up the symbol's definition in the `.h` or `.c` file. Also, are the lengths the same? – Craig Estey Sep 23 '16 at 19:17
  • It is not that uncommon to have build-specific information embedded into object and executable files. For example, the PE header contains a timestamp field filled in by the linker (PE is the executable and DLL format on Windows). – Hristo Iliev Sep 26 '16 at 16:00

1 Answers1

3

The debugging information (symbols, date) in the object can make the object change, even if the code is strictly identical.

To ensure you don't have any change, just strip the objects:

strip *.o

The best way of comparing objects/performing a checksum on them is on stripped objects, otherwise you can never be sure.

(The same technique can be applied to executables)

Note: Once you stripped the objects you can link them but you'll have a hard time to debug. You can do it on a copy (theobject.o is unchanged, then):

strip theobject.o -o theobject_stripped.o

We use that process when performing "formal production" of our executables before delivery.

Actually we do it the other way round: we compare stripped executables, and if there's a difference, we compare stripped objects to find the culprit and narrow it down. Then we use our version control system on the sources to find why it changed.

Edit: if a custom time-dependent macro is used to define the date in the object files (-DDATE=\"somedate\") the checksum process will need more than a strip operation. The reverse operation (removing the date/version/whatever) from the object file (or files) has to be done with custom tools. You can benefit from this feature and have most of the object files untouched by applying the macro only on one file which contains the version (Version.o) on an exported symbol.

Checksum on that file will be different, but the other ones will be identical (or your coleagues are making it very hard for you pointlessly)

EDIT: for SDCC you have a similar tool called sdobjcopy the interface of which looks very very much like objcopy and has features to strip objects

sdobjcopy --strip-all theobject.o theobject_stripped.o

(there's also a --strip-debug option if the --strip-all option is too "violent")

check sdobjcopy man page for more details.

EchoLynx
  • 410
  • 5
  • 11
Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • You have identified the likely reason for the difference, but stripping the object files is not certain to clean it up, because the varying parts could be in the actual program data. For example, there could be a startup banner or internal version string that includes a build timestamp. – John Bollinger Sep 23 '16 at 19:05
  • @JohnBollinger: you are right, but in that case, it's a custom process, which can be reverted with custom tools, for instance strip+find the version/date zone and blank it out. This custom process should be easily identifiable in the makefile. – Jean-François Fabre Sep 23 '16 at 19:10
  • 2
    The key phase to search on is 'reproducible builds', they come at it from a security angle, but the problem is the same. – Dan Mills Sep 23 '16 at 20:11
  • `strip` seems like a great tool, but it seems to only work with object files made by the GNU tools. (My PIC project uses SDCC & GPUTILS.) – EchoLynx Sep 26 '16 at 15:24
  • 1
    @EchoLynx found something for you, see my last edit :) – Jean-François Fabre Sep 26 '16 at 15:46