3

I'm using uncrustify to format a directory full of C and C++ code. I need to ensure that uncrustify won't change the resulting code; I can't do a diff on the object file or binaries because the object files have a timestamp and so won't ever be identical. I can't check the source of the files one by one because I'd be here for years.

The project uses make for the build process so I was wondering if there is some way to output something there that could be checked.

I've searched SO and Google to no avail, so my apologies if this is a duplicate.

EDIT: I'm using gcc/g++ and compiling for 32 bit.

Alex Meuer
  • 1,621
  • 3
  • 26
  • 37
  • 1
    You can disassemble the binaries using commands like `objdump` and compare. – Fazlin Jun 27 '16 at 12:58
  • "I can't do a diff on the object file or binaries because the object files have a timestamp and so won't ever be identical." Do `.o` files have an embedded timestamp? I didn't know that. – underscore_d Jun 27 '16 at 12:58
  • 1
    How is the timestamp embedded into the object file? Can it be stripped somehow? Maybe there are binary compare utilities that can skip or ignore certain ranges of bytes in the files it compares, have you tried looking for such utilities? – Some programmer dude Jun 27 '16 at 12:59
  • What is your compiler, architecture, object file format, etc.? This is important to provide a working solution. If MSVC, then see here: http://stackoverflow.com/questions/7895652/is-there-a-way-to-compare-obj-files-from-visual-studio (also hints at a couple of possible avenues for Linux-like compilers) – underscore_d Jun 27 '16 at 13:08
  • git diff --color-words? – Tadeusz Kopec for Ukraine Jun 27 '16 at 13:28
  • 1
    You can, in fact, do a binary diff, after you scrub the timestamp from the object file. One example program that does for PE/PDB files this is [zap_timestamps](https://github.com/google/syzygy/tree/master/syzygy/zap_timestamp). –  Jun 27 '16 at 13:45
  • 2
    I have, in the past, had to compare generated binaries that should have been identical (although for different reasons). In such cases I used `objcopy` to pull the relevant sections -- .text, .data etc. -- from the elf files and compared those. Admittedly tedious, but would that not work in this case? – G.M. Jun 27 '16 at 16:27

1 Answers1

2

One possibility would be to compile them with CLang, and get the output as LLVM IR. If memory serves, this should be command line arguments of -S -emit-llvm.

To do the same with gcc/g++, you can use one of its flags to generate a file containing its intermediate representation at some stage of compilation. Early stages will still show differences from changes in white space and such, but a quick test indicates that by the SSA stage, such non-operational changes have disappeared from the IR.

g++ -c -fdump-tree-ssa foo.cpp

In addition to the normal object file, this will produce a file named foo.cpp.018t.ssa that represents the semantic actions in your source file.

As noted above, I haven't tested this extensive though--it's possible that at this stage, some non-operational changes will still produce different output files (though I kind of doubt it). If necessary, you can use -fdump-tree-all to get output from all stages of compilation1. As a simple rule of thumb, I'd expect later stages to be more immune to changes in formatting and such, so if the ssa stage doesn't work, my next choice would probably be the optimized stage, which is one of the last stages (note: the files produced are numbered in order of the stage that produced each file, so when you dump all stages, it's obvious which are produced by early stages and which by later stages).


1. Note that this produces quite a few files, many of them quite large. The first time you do this, you probably want to do it on a single source file in a directory by itself to keep from drowning in files, so to speak. Also, don't be surprised when compilation this way takes quite a bit longer than normal.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • I generated the ssa files, putting the ones from the crusty source into one folder and the crustfree into another folder. Comparing them with `diff -rq crusty crustfree` says they all differ (some files are only present in the crusty output). Would this indicate the the uncrusting process _has_ in fact changed the functionality of the code somehow? – Alex Meuer Jun 27 '16 at 15:28
  • 1
    @AlexMeuer: There's the difficulty: if the files were identical, we could easily conclude that the changes made no difference to the generated code. Their being different means changes in meaning are possible, but not certain (and, unfortunately, I doubt anything else you can produce easily is going to provide much stronger guarantees either). – Jerry Coffin Jun 27 '16 at 15:41