4

For example, I have two C binary executable files. How can I determine whether the two were generated using same source code or not?

klutt
  • 30,332
  • 17
  • 55
  • 95
Toddler
  • 99
  • 1
  • 7
  • 4
    You cannot, in general. Even the same file compiled twice may generate different code because C compilation is non-deterministic on most platforms. You may be able to calculate hashes and compare the files to see if they match. If they don't, they may still be from same source. – Tanveer Badar Jan 20 '20 at 06:40
  • 1
    Q: Out of curiousity, *why* do you wish to do this? Q: Have you considered adding a version stamp in your build process? PS: You *can* check if two binary files are identical with tools like [cksum](https://linux.die.net/man/1/cksum) or [mdsum](https://linux.die.net/man/1/md5sum). – FoggyDay Jan 20 '20 at 06:43
  • There is a way to check if two executable files are identical using checksum, but not sure if this might help in your case, it depends on what you want in your case. – ROOT Jan 20 '20 at 06:43
  • Toddler, as FoggyDay has hinted, it might help to take one step back and explain what you want to achieve by doing this. We might be looking at a https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem – Yunnosch Jan 20 '20 at 06:57
  • @FoggyDay well, i want to build a system where one of the services i intend to provide is that users can submit source file and get output file as return. But, before sending the output file to them i need to check binaries. that's why. – Toddler Jan 20 '20 at 07:07
  • @Yunnosch i think it's not a XY problem. I am gathering necessary information and this is a very important one. – Toddler Jan 20 '20 at 07:08
  • @Toddler Are you saying that users send source files to you, and then YOU compile them? – klutt Jan 20 '20 at 07:11
  • @klutt yes, kind of. – Toddler Jan 20 '20 at 07:14
  • 1
    You can look at Deterministic Build in C++ https://stackoverflow.com/questions/14653874/how-to-produce-deterministic-binary-output-with-g and https://blog.conan.io/2019/09/02/Deterministic-builds-with-C-C++.html – Rishikesh Raje Jan 20 '20 at 07:44

1 Answers1

8

In general, this is completely impossible to do.

  • You can generate different binaries from the same source
  • Two identical binaries can be generated from different sources

It is possible to add version information in different ways. However, you can fool all of those methods quite easily if you want.

Here is a short script that might help you. Note that it might have flaws. It's just to show the idea. Don't just copy this and use in production code.

#!/bin/bash 

STR="asm(\".ascii \\\"$(md5sum $1)\\\"\");"
NEWNAME=$1.aux.c
cp $1 $NEWNAME
echo $STR >> $NEWNAME
gcc $NEWNAME

What it does is basically to make sure that the md5sum of the source gets included as a string in the binary. It's gcc specific, and you can read more about the idea here: embed string via header that cannot be optimized away

klutt
  • 30,332
  • 17
  • 55
  • 95