4

If you compile two C programs that differ only in the return value, I'd expect the binary to differ only in the bits of this value. However, if I compile the following programs using GCC, dump the bits of the binary (using xxd) and diff the dumps, I get another difference.

Files

return127.c

int main() {
    return 127;
}

return128.c

int main() {
    return 128;
}

Compile, Dump And Diff

# compile
gcc -Os -fdata-sections -ffunction-sections -fipa-pta -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -Wl,--strip-all return127.c -o return127
gcc -Os -fdata-sections -ffunction-sections -fipa-pta -Wl,--gc-sections -Wl,-O1 -Wl,--as-needed -Wl,--strip-all return128.c -o return128
# dump
xxd -b return127 > return127.xxd-bits
xxd -b return128 > return128.xxd-bits
# diff
diff return127.xxd-bits return128.xxd-bits

Note: I use the compile command of this comment to a question about the smallest binary of a C program.

Diff

108,111c108,111
< 00000282: 01010101 00000000 01101011 11011010 11101100 11100011  U.k...
< 00000288: 00111010 10001111 00101111 00101100 01100001 00111100  :./,a<
< 0000028e: 10010010 11001011 00011000 11101010 11100111 00100011  .....#
< 00000294: 01001010 00111011 11111001 11111010 00000001 00000000  J;....
---
> 00000282: 01010101 00000000 00011101 11000011 10101000 00011001  U.....
> 00000288: 11011011 00110001 10100000 01001101 01000110 10010011  .1.MF.
> 0000028e: 00101101 01011101 11101001 00001000 01010101 11111101  -]..U.
> 00000294: 11011011 01000011 11010100 10101011 00000001 00000000  .C....
211c211
< 000004ec: 00000000 00000000 00000000 00000000 10111000 01111111  ......
---
> 000004ec: 00000000 00000000 00000000 00000000 10111000 10000000  ......

There are two differences. The difference at the bottom shows the (expected) difference of the return values. The lines differ only in the last byte/block. Binary 01111111 is decimal 127. Binary 10000000 is decimal 128.

What is the difference at the top?

maiermic
  • 4,764
  • 6
  • 38
  • 77
  • 3
    Are the binaries identical if you build from the same source code twice? My guess is the date and time of the build may be stored somewhere in the header of the executable file. What type of binary are you creating? [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format)? If the first 4 bytes have the values `7F 45 4C 46`, then it is an ELF file. – Andreas Wenzel Jun 12 '21 at 12:01
  • 1
    Yes, they are the same if I build twice (even if I wait several minutes between the builds). It is an ELF file. – maiermic Jun 12 '21 at 12:20
  • 2
    https://www.google.com/search?q=gcc+reproducible+builds | `-Wl,-O1` why do you pass `O1` to linker but `Os` to gcc? – KamilCuk Jun 12 '21 at 12:26
  • What if you do `return 128` from a file named `return127.c`? – Steve Summit Jun 12 '21 at 12:34
  • @SteveSummit I get the same binary. – maiermic Jun 12 '21 at 12:40
  • @KamilCuk Thank you for the search terms. They should lead to the solution of this question and further articles about the topic. As I noted in the description of my question, I just used the compile command of the linked comment. I overlooked the difference (`O1` and `Os`) between linker and gcc. So there is no specific reason. – maiermic Jun 12 '21 at 12:51

1 Answers1

4

What is the difference at the top?

It's build id difference. Install diffoscope (or compare readelf --wide --notes output from both libraries) and you'll nicely see:

│  Displaying notes found in: .note.gnu.build-id
│    Owner                Data size     Description
│ -  GNU                  0x00000014    NT_GNU_BUILD_ID (unique build ID bitstring)     Build ID: 817d41c45a09c3822337307250bdb9410a1959b4
│ +  GNU                  0x00000014    NT_GNU_BUILD_ID (unique build ID bitstring)     Build ID: de5fb81907549af3332e8136d6bd7ab4d884e0ce

How to compile C programs such that binaries differ only in different return value?

  1. You have to set __TIME__ and __DATE__ to the same time on both gcc.
  2. You have to make unique build-id for both calls.

The following script:

export SOURCE_DATE_EPOCH=$(date +%s)
f() {
    gcc -Wl,--build-id=none \
       -Os -fdata-sections -ffunction-sections -fipa-pta \
       -Wl,--gc-sections -Wl,--as-needed -Wl,--strip-all \
       -xc - -o "$1"
}
echo 'main(){return 127;}' | f /tmp/1
echo 'main(){return 128;}' | f /tmp/2
diffoscope /tmp/1 /tmp/2

and diffoscope outputs:

│  0000000000001020 <.text>:
│ - mov    $0x7f,%eax
│ + mov    $0x80,%eax
│   retq   
KamilCuk
  • 120,984
  • 8
  • 59
  • 111