32

I googled and I see a surprising amount of flippant responses basically laughing at the asker for asking such a question.

Microchip provides some source code for free (I don't want to post it here in case that's a no-no. Basically, google AN937, click the first link and there's a link for "source code" and its a zipped file). Its in ASM and when I look at it I start to go cross-eyed. I'd like to convert it to something resembling a c type language so that I can follow along. Because lines such as:

GLOBAL  _24_bit_sub
movf    BARGB2,w
subwf   AARGB2,f

are probably very simple but they mean nothing to me.

There may be some automated ASM to C translator out there but all I can find are people saying its impossible. Frankly, its impossible for it to be impossible. Both languages have structure and that structure surely can be translated.

NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
Steven
  • 576
  • 1
  • 5
  • 12
  • 4
    Frankly, give a reason why it must be possible. – Gerhard Sep 08 '09 at 05:46
  • 4
    If you want to understand the code, the only way is to learn this assembly language. Automated translation will give you something unreadable. Maybe even more unreadable than source. And decompiler (if exist) can't help you, if this code is hand-written (not compiled first). But if you want just run/port the code, look at my post. – zxcat Sep 24 '09 at 07:51
  • @Gerhard There's already an assembly-language-to-C compiler called Boomerang. Some assembly language programs can be automatically decompiled to C, like [this one here](http://boomerang.sourceforge.net/cando.php?hidemenu). – Anderson Green Apr 23 '13 at 07:23
  • @Anderson Green. Boomerang looks interesting. It does chokes a bit on more complex code and is not going to help with Microchip assembly. Boomerang assumes that the assembler is the output of an compiler and not a human written program. Boomerang gives you something to work with but it is not and end to end solution. For usable results you are still going to be editing it by hand. – Gerhard Apr 24 '13 at 06:56
  • Both C and assembly are Turing-complete. Therefore, they can express the same things. Q.E.D. – Alexander Oct 05 '21 at 10:01

13 Answers13

49

You can absolutely make a c program from assembler. The problem is it may not look like what you are thinking, or maybe it will. My PIC is rusty but using another assembler, say you had

add r1,r2

In C lets say that becomes

r1 = r1 + r2;

Possibly more readable. You lose any sense of variable names perhaps as values are jumping from memory to registers and back and the registers are being reused. If you are talking about the older pics that had what two registers an accumulator and another, well it actually might be easier because variables were in memory for the most part, you look at the address, something like

q = mem[0x12];
e = q;
q = mem[0x13];
e = e + q;
mem[0x12] = e;

Long and drawn out but it is clear that mem[0x12] = mem[0x12] + mem[0x13];

These memory locations are likely variables that will not jump around like compiled C code for a processor with a bunch of registers. The pic might make it easier to figure out the variables and then do a search and replace to name them across the file.

What you are looking for is called a static binary translation, not necessarily a translation from one binary to another (one processor to another) but in this case a translation from pic binary to C. Ideally you would want to take the assembler given in the app note and assemble it to a binary using the microchip tools, then do the translation. You can do dynamic binary translation as well but you are even less likely to find one of those and it doesnt normally result in C but one binary to another. Ever wonder how those $15 joysticks at wal-mart with pac-man and galaga work? The rom from the arcade was converted using static binary translation, optimized and cleaned up and the C or whatever intermediate language compiled for the new target processor in the handheld box. I imagine not all of them were done this way but am pretty sure some were.

The million dollar question, can you find a static binary translator for a pic? Who knows, you probably have to write one yourself. And guess what that means, you write a disassembler, and instead of disassembling to an instruction in the native assembler syntax like add r0,r1 you have your disassembler print out r0=r0+r1; By the time you finish this disassembler though you will know the pic assembly language so well that you wont need the asm to C translator. You have a chicken and egg problem.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 10
    Good points; +1 for your explanations. However I don't believe that it can be done 100% automatically. Because in assembly the code can use every bit of the cpu. For instance, side effects as the setting of zero and carry and sign bits can be used at any place in the code. These are not normally visible in C. So such a translator would need to filter the intended from the non-intended changes. Virtualisation is easier, as it does not need an understanding of the code, it just models the cpu. – Adriaan Sep 07 '09 at 13:46
  • 3
    It can be done 100% automatically and preserving semantics. Coding style and other conventions on the other hand, will be completely lost (because they're implicit). – Michael Foukarakis Sep 15 '09 at 08:27
24

Getting the exact same source code back from a compiled program is basically impossible. But decompilers have been an area of research in computer science (e.g. the dcc decompiler, which was a PhD project).

There are various algorithms that can be used to do pattern matching on assembly code and generate equivalent C code, but it is very hard to do this in a general way that works well for all inputs.

You might want to check out Boomerang for a semi-recent open source effort at a generalized decompiler.

bobbymcr
  • 23,769
  • 3
  • 56
  • 67
  • 1
    I spoke too soon. First, when I wrote "worked" it was a typo - supposed to be "works". I read through their pages but hadn't actually tried it. Now that I have, it doesn't work at all! It is definately for reverse engineering, as it will ONLY (stupidly) accept a binary file. Sigh. Maybe "decompiler" isn't what I'm looking for, but rather simply a "translator" – Steven Sep 04 '09 at 06:07
  • 2
    @Steven: - couldn't you compile the code and then decompile it using Boomerang? – a_m0d Sep 08 '09 at 07:19
  • Looks like Boomerang doesn't support this CPU: http://boomerang.sourceforge.net/FAQ.php#whatbinary , so compilation will not help – zxcat Sep 18 '09 at 18:23
  • You would not want to compile first, if you actually have the source code. It is generally believed, translating from code produces better output. – J. M. Becker Nov 15 '11 at 05:38
  • 1
    There are many many choices now other than Boomerang (which was mentioned by "bobbymcr" in 2009) . Check-out more-recent [radare2](https://github.com/radareorg/radare2)(r2) which is a decompiler+disassembler+debugger . And for more: 1st goto "[List of Transpilers](https://github.com/atErik/Transpiler.and.similar.List)", then in bottomside there are more Decompilers, etc. – atErik Oct 30 '20 at 01:46
9

I once worked a project where a significant part of the intellectual property was some serious algorithms coded up in x86 assembly code. To port the code to an embedded system, the developer of that code (not me) used a tool from an outfit called MicroAPL (if I recall correctly):

I was very, very surprised at how well the tool did.

On the other hand, I think it's one of those "if you have to ask, you can't afford it" type of things (their price ranges for a one-off conversion of a project work out to around 4 lines of assembly processed for a dollar).

But, often the assembly routines you get from a vendor are packaged as functions that can be called from C - so as long as the routines do what you want (on the processor you want to use), you might just need to assemble them and more or less forget about them - they're just library functions you call from C.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • 1
    yes, it is meant to just be called. The thing is, we already have a home grown PID in place that's more-or-less working just fine. I want to see if their implementation offers any advantages. I don't know if you've ever tuned a PID control but it is very tedious and time consuming so it is very much worth convincing myself that there is a reason to do it rather than just using it and hoping for the best. on the other hand... its not worth paying anything for :) – Steven Sep 04 '09 at 05:51
9

You can't deterministically convert assembly code to C. Interrupts, self modifying code, and other low level things have no representation other than inline assembly in C. There is only some extent to which an assembly to C process can work. Not to mention the resultant C code will probably be harder to understand than actually reading the assembly code... unless you are using this as a basis to begin reimplementation of the assembly code in C, then it is somewhat useful. Check out the Hex-Rays plugin for IDA.

9

Yes, it's very possible to reverse-engineer assembler code to good quality C.

I work for a MicroAPL, a company which produces a tool called Relogix to convert assembler code to C. It was mentioned in one of the other posts.

Please take a look at the examples on our web site:

http://www.microapl.co.uk/asm2c/index.html

Simon Marsden
  • 91
  • 1
  • 1
7

There must be some automated ASM to C translator out there but all I can find are people saying its impossible. Frankly, its impossible for it to be impossible.

No, it's not. Compilation loses information: there is less information in the final object code than in the C source code. A decompiler cannot magically create that information from nothing, and so true decompilation is impossible.

kquinn
  • 10,433
  • 4
  • 35
  • 35
  • 2
    Unless I misunderstood something the question is not about decompiling, but rather converting the ASM source to C – UnkwnTech Sep 04 '09 at 01:56
  • 20
    He's not asking for it to be decompiler back to the *original* source (that's the hamburger-back-to-cow problem). He just wants a C *translation*, which is absolutely possible because C is Turing-complete. – caf Sep 04 '09 at 01:56
  • 2
    Ever used an assembly to C translator? They spit out assembly language but with C syntax: you get variables like `eax` and `ebx` (for IA-32). They're almost, but not quite, entirely useless. You may as well just learn to read the assembly language. – kquinn Sep 04 '09 at 01:59
  • 1
    Decompiling (converting machine code to C) is essentially the same problem as translation (converting assembly, a.k.a. symbolic machine code, to C), just with different semantics. – Chris Lutz Sep 04 '09 at 02:02
  • 4
    kquinn: Useless or not, that's exactly what he's asked for. Maybe it'd help him to learn the assembly syntax if he could see the C equivalent side-by-side? – caf Sep 04 '09 at 02:14
  • Thank you kquinn. You're exactly right. I just want to see what's going on in the moderately-documented program so that I can properly use it. – Steven Sep 04 '09 at 02:24
  • 2
    @Steven - You might find a program to do the job, but it's not going to do it very well. You won't get the best C code money can buy, and in some cases you might not even get code that compiles. You're probably going to have to guess at some things and try your best to rewrite some of it to be more readable. – Chris Lutz Sep 04 '09 at 02:38
  • That's fine. I don't care if it compiles. And I'm sure it won't be the most readable code in the world but it has got to be better than the mess it is now! I can't even follow the simple addition math its apparently doing. – Steven Sep 04 '09 at 05:53
  • 1
    "but it has got to be better than the mess it is now!" No. No, it usually will *not* be, because the assembly will have been written by a human (who could actually read it). The C will have been written by a machine which cares not for the human mind. Decompilers typically make things *worse*, not better. – kquinn Sep 04 '09 at 17:36
5

It isn't impossible, just very hard. A skilled assembly and C programmer could probably do it, or you could look at using a Decompiler. Some of these do quite a good job of converting the asm to C, although you will probably have to rename some variables and methods.

Check out this site for a list of decompilers available for the x86 architecture.

a_m0d
  • 12,034
  • 15
  • 57
  • 79
4

Check out this: decompiler

A decompiler is the name given to a computer program that performs the reverse operation to that of a compiler. That is, it translates a file containing information at a relatively low level of abstraction (usually designed to be computer readable rather than human readable) into a form having a higher level of abstraction (usually designed to be human readable).

Nick Dandoulakis
  • 42,588
  • 16
  • 104
  • 136
4

Not easily possible.

One of the great advantages of C over ASM apart from readability was that it prevented "clever" programing tricks.

There are numerous things you can do in assembler that have no direct C equivalent, or involve tortuous syntax in C.

The other problem is datatypes most assemblers essentialy have only two interchangeable datatypes: bytes and words. There may be some language constructs to define ints and floats etc. but there is no attempt to check that the memory is used as defined. So its very difficult to map ASM storage to C data types.

In addition all assembler storage is essentially a "struct"; storage is layed out in the order it is defined (unlike C where storage is ordered at the whim of the runtime). Many ASM programs depend on the exact storage layout - to acheive the same effect in C you would need to define all storage as part of a single struct.

Also there are a lot of absused instructions ( on olde worldy IBM manframes the LA, load address, instruction was regulary used to perform simple arithimatic as it was faster and didnt need an overflow register )

While it may be technically possible to translate to C the resulting C code would be less readable than the ASM code that was transalated.

James Anderson
  • 27,109
  • 7
  • 50
  • 78
4

I can say with 99% guarantee, there is no ready converter for this assembly language, so you need to write one. You can simply implement it replacing ASM command with C function:

movf    BARGB2,w -> c_movf(BARGB2,w);
subwf   AARGB2,f -> c_subwf(AARGB2,f);

This part is easy :) Then you need to implement each function. You can declare registers as globals to make things easy. Also you can use not functions, but #defines, calling functions if needed. This will help with arguments/results processing.

#define c_subwf(x,y) // I don't know this ASM, but this is some Substraction must be here

Special case is ASM directives/labels, I think it can be converted with #defines only.

The fun starts when you'll reach some CPU-specific features. This can be simple function calls with stack operations, some specific IO/Memory operations. More fun are operations with Program Counter register, used for calculations, or using/counting ticks/latencies.

But there is another way, if this hardcore happens. It's hardcore too :) There is a technique named dynamic recompilation exists. It's used in many emulators.

You don't need recompile your ASM, but the idea is almost the same. You can use all your #defines from first step, but add support of needed functionality to them (incrementing PC/Ticks). Also you need to add some virtual environment for your code, such as Memory/IO managers, etc.

Good luck :)

zxcat
  • 2,054
  • 3
  • 26
  • 40
2

I think it is easier to pick up a book on PIC assembly and learn to read it. Assembler is generally quite simple to learn, as it is so low level.

jakobengblom2
  • 5,531
  • 2
  • 25
  • 33
1

It is difficult to convert a function from asm to C but doable by hand. Converting an entire program with a decompiler will give you code that can be impossible to understand since to much of the structure was lost during compilation. Without meaningful variable and function names the resultant C code is still very difficult to understand.

The output of a C compiler (especially unoptimised output) of an basic program could be translatable to C because of repeated patterns and structures.

Gerhard
  • 6,850
  • 8
  • 51
  • 81
1

Check out asm2c

Swift tool to transform DOS/PMODEW 386 TASM Assembly code to C code

franck
  • 2,995
  • 3
  • 17
  • 28