every c program is converted to machine code, if this binary is distributed. Since the instruction set of a computer is well known, is it possible to get back the C original program?
-
1is it possible to get back the C original program? – shreyasva Feb 16 '10 at 12:17
-
4@Roger Pate, put a question symbol at the end of "original program" and you have your question :| wow.... – Shawn Mclean Feb 16 '10 at 12:18
-
You know, I completely read that as "it is". Even so, show that you've put some thought into it. What do you mean by original? Do you realize some of the basics, like comments are removed? – Feb 16 '10 at 12:20
-
yea i do, atleast the basic skeleton of the program which mainly includes the logic – shreyasva Feb 16 '10 at 14:32
-
Names recovery depends on compiler options. gcc's -g switch can preserve some. – joveha Feb 16 '10 at 16:14
-
26I just made a fish soup. can I get the original fish back? – Stefano Borini Feb 16 '10 at 17:46
-
2Two major obstacles: Optimization and Dropping Name Information. In run time code, compilers prefer to drop function and variable names because they aren't needed. Compilers can also optimize code so that it doesn't *look* like the original but functions the same. – Thomas Matthews Feb 16 '10 at 18:23
-
1@stefano: *if* the question is about the textual reconstruction and not the logic, the usual line goes: "You can make hamburgers from cows, but not cows from hamburgers" :) – joveha Feb 16 '10 at 18:30
-
1@stefano that is sooo not the analogy – shreyasva Feb 18 '10 at 10:46
-
user265260: at least it's tasty. – Stefano Borini Feb 18 '10 at 12:04
-
It can be decompiled to source code (theoretically). For example, each instruction, like ASM code - [ADD R0, R1, R2] can be converted to C code - [R0 = R1 + R2;]..heh, static code emulator ;). But you can't obtaind real original source code. – Egg Head Oct 19 '13 at 23:02
9 Answers
You can never get back to the exact same source since there is no meta-data about that saved with the compiled code.
But you can re-create code out from the assembly-code.
Check out this book if you are interested in these things: Reversing: Secrets of Reverse Engineering.
Edit
Some compilers-101 here, if you were to define a compiler with another word and not as technical as "compiler", what would it be?
Answer: Translator
A compiler translates the syntax / phrases you have written into another language a C compiler translates to Assembly or even Machine-code. C# Code is translated to IL and so forth.
The executable you have is just a translation of your original text / syntax and if you want to "reverse it" hence "translate it back" you will most likely not get the same structure as you had at the start.
A more real life example would be if you Translate from English to German and the from German back to English, the sentance structure will most likely be different, other words might be used but the meaning, the context, will most likely not have changed.
The same goes for a compiler / translator if you go from C to ASM, the logic is the same, it's just a different way of reading it ( and of course its optimized ).

- 36,033
- 20
- 126
- 183
It depends on what you mean by original C program. Things like local variable names, comments, etc... are not included in the binary, so there's no way to get the exact same source code as the one used to produce the binary. Tools such as IDA Pro might help you disassemble a binary.

- 1,023,142
- 271
- 3,287
- 2,928
-
no not to the level of variable names, but atleast the logic of the program – shreyasva Feb 16 '10 at 11:50
-
I would guestimate the conversion rate of a really skilled hacker at about 1 kilobyte of machine code per day. At common Western salaries, that puts the price of, say, a 100 KB executable at about $25,000. After spending that much money, all that's gained is a chunk of C code that does exactly what yours does, minus the benefit of comments and whatnot. It is no way competitive with your version, you'll be able to deliver updates and improvements much quicker. Reverse engineering those updates is a non trivial effort as well.
If that price tag doesn't impress you, you can arbitrarily raise the conversion cost by adding more code. Just keep in mind that skilled hackers that can tackle large programs like this have something much better to do. They write their own code.

- 922,412
- 146
- 1,693
- 2,536
-
Doing a quick analysis of a few files reveals that 1KB is only a few hundred lines of machine code, and it tends to be less the larger the program. Analyzing that much would be relatively easily for a not so skilled hacker. However I doubt somebody would be willing to pay for raw C code that would need further reverse engineering, so the programmer would need to rewrite or otherwise document the code in a usable way. If you just want to steal an algorithm, or interface with a specific secret API it could be cost effective, but else, it sounds like just too expensive to bother. – Feb 16 '10 at 14:22
One of the best works on this topic that I know about is:
Pigs from sausages? Reengineering from assembler to C via FermaT.
The claim is you get back a reasonable C program, even if the original asm code was not written in C! Lots of caveats apply.

- 93,541
- 22
- 172
- 341
-
Nice. I have one of the caveats. The result of running one of my programs through it produces a C program but the resulting C program doesn't work due to the assembly code assuming my nonstandard stack frame. – Joshua May 01 '10 at 18:45
-
You actually used the FermaT converter? ... I'd expect that stack frame layout wouldn't make a bit of difference; a stack frame is just a kind of "struct" and if you can propose reasonable structs for heap values, you should be able to do it for stack frames. Can you give a specific example? – Ira Baxter May 01 '10 at 21:03
Working on tools that do this is a research activity. That is, it is possible to get something in the easy cases (you won't recover local variables names unless debug symbols are present, for instance). It's nearly impossible in practice for large programs or if the programmer had decided to make it difficult.

- 79,187
- 7
- 161
- 281
The Hex-Rays decompiler (extension to IDA Pro) can do exactly that. It's still fairly recent and upcoming but showing great promise. It takes a little getting used to but can potentially speed up the reversing process. It's not a "silver bullet" - no c decompiler is, but it's a great asset.

- 2,599
- 2
- 17
- 19
-
How does the decompiler differentiate between `for`, `while` and `do-while` loops when then can be coded differently in the source? Also, how does it get the names of the original functions? How does it "un-optimize" the code to match the original? How does it determine the variable names? Perform a search of *news:comp.lang.c" for *hamburger* and *cow*. – Thomas Matthews Feb 16 '10 at 18:21
-
1Does it matter? Either will do if I care about the logic and not the form – joveha Feb 16 '10 at 18:32
The common name for this procedure is "turning hamburger back into cows." It's possible to reverse engineer binary code into a functionally equivalent C program, but whether that C code bears a close resemblance to the original is an open question.

- 119,563
- 19
- 122
- 198
There is not a 1:1 mapping between a C program and the ASM/machine code it will produce - one C program can compile to a different result on different compilers or with different settings) and sometimes two different bits of C could produce the same machine code.
You definitely can generate C code from a compiled EXE. You just can't know how similar in structure it will be to the original code - apart from variable/function names being lost, I assume it won't know the original way the code was split amongst many files.

- 60,845
- 93
- 320
- 589
You can try hex-rays.com, it has a really nice decompiler which can decompile assembly code into C with 99% accuracy.

- 133
- 6