6

I am trying to produce a tool which is smart enough to programmtically examine release version binaries produced by identical C# code compiled on two seperate machinces at different times and conclude that the code was identical while being able to pick up any code changes if present in the c# code used to produce these binaries.

I have tried using a number of approaches but in order to keep this short i'll just stick to the latest attempt. I run ildasm with the /text option on the binaries and replace the GUIDs for anonymous fields etc in text, but when the binaries come from different pcs i find that the text produced by ILDASM /text option is reordered. The binaries originating from the same code but compiled by same setup on different machines also appear heavily reordered. Any suggestion how one may be able to control this reordering of IL would be much appreciated ?

Cheers

PS: Any alternative strategies of reliably accomplishing this are also most welcome.

Comic Book Guy
  • 119
  • 2
  • 13
  • 1
    and you can't compare the source code because?... – Alastair Pitts Jul 05 '12 at 06:51
  • Also, this is probably relevant: http://stackoverflow.com/questions/1335427/why-does-c-sharp-generate-different-exes-for-the-same-source-code – Alastair Pitts Jul 05 '12 at 06:52
  • 1
    If possible, could you tell why do you need such a tool? Perhaps there is much better solution to the problem you are trying to solve by creating it. – Nikola Anusev Jul 05 '12 at 06:54
  • @Alastair Pitts This is to prove some regulators that the binaries running on some machine came from a particular version of code, so they can compile the code themselves and verify that the binaries exectuing match the ones they compiled using this tool. This is a regulatory process need for our firm. – Comic Book Guy Jul 05 '12 at 06:55
  • the guids and the date time stamps i can change in the ildasm produced text but the heavy reordering is what is really getting to me. i was really hoping to find a solution without needing to thoroughly parse the ILDASM text output. – Comic Book Guy Jul 05 '12 at 06:58
  • why don't you run the dll through .NET Reflector and do a diff on the code produced? – stijn Jul 05 '12 at 07:13
  • 5
    when the principal developer on the compiler team speaks, you should listen: http://blogs.msdn.com/b/ericlippert/archive/2012/05/31/past-performance-is-no-guarantee-of-future-results.aspx – Mike Zboray Jul 05 '12 at 07:14
  • @mikez Thank you for the link i 'll try out controlling the number of cpu used by vs.net in the build hopefully that is the same as the number of CPUs used by the CSC.exe – Comic Book Guy Jul 05 '12 at 10:54
  • 1
    @Hardy msbuild has a command line parameter for this, maxcpucount. – Mike Zboray Jul 05 '12 at 15:47

2 Answers2

7

Waiting for Eric Lippert to wake up :) - community wiki out of @mikez 's comment:

When a principal developer (Eric Lippert) on the compiler team speaks, you should listen: http://ericlippert.com/2012/05/31/past-performance-is-no-guarantee-of-future-results/ contains detailed explanation and strong recommendation for not doing it (likely in response to this precise question):

Is compiling the same C# program twice guaranteed to produce the same binary output?

No.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
  • As of February 2016 (Roslyn compiler), this is no longer true : there is a -deterministic option for this kind of scenario. - [Announcement](https://blog.paranoidcoding.com/2016/04/05/deterministic-builds-in-roslyn.html) - [Microsoft documentation](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/compiler-options/deterministic-compiler-option) This should be updated since I landed here *before* the MS documentation page. – Michel de Becdelièvre Aug 06 '19 at 13:40
2

I found that a solution in accordance to what Eric Lippert's mentioned in his post what his client ended up settling for can be reached by setting the processor affinity for the compilation process to 01. After this the executables/ dlls produced are almost identical in excpetion to som mvid and guids used. Running ILDASM on these binaries with the text mode and building a simple hashing tool to strip away this random stuff provides such a solution. I am just providing this for the sake of completion and to help others who may face this problem.

Comic Book Guy
  • 119
  • 2
  • 13