2

Is there a tool to compare the control flow of some disassembly and some C?

Here's my situation: I started with the disassembly (x86_64) of a function. In some C code, I have attempted--with the help of the decompilation provided by Hopper.app--to create a function with the same functionality. I would like to be sure that my C code recreates the functionality of the disassembly exactly (for all possible inputs and global states). My hope is that it is possible to compare control flow graphs.

According to the abstract for this paper, there has been some work in the area, at least in the context of Java.

Is there any available tooling around generating and comparing control flow graphs for disassembly? Ideally I could compare the control flow of the disassembly which I started with and the C code that I've come up with, but comparing the control flow of the disassembly I started with and dis/assembly that I can generate from my C code would still be fantastic.

Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Nate Chandler
  • 4,533
  • 1
  • 23
  • 32
  • 2
    I've never heard a tool like that but you can use `gcc -Wa,-adhln -g helloworld.c > helloworld.s` to generate an assembly file intemixed with C code http://stackoverflow.com/a/19083877/995714 http://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly http://www.fclose.com/240/generate-a-mixed-source-and-assembly-listing-using-gcc/ http://24alpha.wordpress.com/2007/12/18/how-to-get-gcc-to-interleave-assembly-output-with-original-source-code/ – phuclv Nov 30 '14 at 06:32
  • Solution to similar problem able to say yes/no with a probability is known as [Wikipedia: Copy/Paste Detector](http://en.wikipedia.org/wiki/PMD_%28software%29#Copy.2FPaste_Detector_.28CPD.29) and [Wikipedia: Plagiarism detection](http://en.wikipedia.org/wiki/Plagiarism_detection#In_source_code). In the worst case you'll probably have to revert to black box testing and code coverage measurement using a tests that can reproduce the "all possible inputs and global states" – xmojmr Nov 30 '14 at 09:00
  • 1
    Most of those Copy/Paste/Plagiasm detectors are not comparing control flow graphs; mostly they are looking for syntactic hints of similarity. The one I know about that does a semantic check is comparing PDGs on C code; that might be helpful but mostly produces yes/no answers to the question "do these match?". OP probably really wants one that will tell him where the two graphs diverge in detail so that he can do something about it. – Ira Baxter Nov 30 '14 at 11:25
  • 1
    How big is the code you reverse-engineered? A big problem I see is that most assembly code uses odd instructions and additional control flows that probably are hard to model in your program. For x86-32, testing if a long (64 bit value) is greater than another take two tests and corresponding jumps, but is coded in C as a single compare on the long type. Jump indirects, stack manipulation, peculiar conventions for passing values to arguments, are all extremely hard to model. How far along are you really in this process? – Ira Baxter Nov 30 '14 at 11:31
  • Tool recommendations are not [on-topic](http://stackoverflow.com/help/on-topic) for SO, else I would recommend IDA Pro... – Jongware Nov 30 '14 at 11:33
  • 1
    ... I would think your best bet would be to write a bit of C code for *each* individual instruction, simulating the effect the instruction has on a simulated set of registers. This would get you a very close model, essentially being correct-by-construction. And this avoids the control flow graph check. You can simplify the resulting code after the fact by correctness-preserving transformations to get a prettier result. – Ira Baxter Nov 30 '14 at 11:33
  • @IraBaxter Thank you for your insight and suggestions. You're absolutely right that ideally the tool would tell me where the two graphs diverge, but even being able to produce yes/no answers would be better than nothing. Is the tool that you alluded to CloneDR? To your question: the code that I have reverse engineered is about 100 lines of C. – Nate Chandler Dec 01 '14 at 19:42
  • 1
    Our CloneDR operates by comparing syntax trees, not control flow graphs. For PDG based clone detection, the key work is http://www.eecs.yorku.ca/course_archive/2004-05/F/6431/ResearchPapers/Krinke.pdf ; there's a bunch of follow on work you can find at scholar.google.com CloneDR is commercial, the PDG matchers are largely university works which means unobtainable and unusable in practice. Assuming you had one, it wouldn't help you; you want to care the CFG from the assembly code and the C code, and all these tools work only on a single lanuage (either C or Java). – Ira Baxter Dec 01 '14 at 20:37
  • 1
    If you *really* wanted to follow through, you'd want a tool that could extract assembler CFG and CFGs for C in a comparable representation, so that you could compare them. I doubt such a tools exists anywhere. (Our DMS Software Reengineering Toolkit could be used to build such a tool; it has assembler and C front ends, and a uniform control flow graph model). For 100 lines of C code, unless you can justify the expense, you don't want to build such a tool yourself. My suggestion about modelling machine instructions is probably your best bet to get a reliable equivalence. – Ira Baxter Dec 01 '14 at 20:39

0 Answers0