2

Suppose I where to take a c++ program and compile it into an assembly (.S) file.

Then I take that assembly file and "dissassemble" that into C, would that code be recompileable on a different platform?

The reason I ask this is that the platform I am trying to develop on does not have a c++ compiler by it does have a c compiler.

DarthRubik
  • 3,927
  • 1
  • 18
  • 54
  • If your dis-assembler gives valid c code why not? but first you need to verify that it gives a valid c code. – SHR Jun 04 '16 at 21:39
  • @SHR What I am worrying about is it turning `uint16_t`s into `uint8_t`s and stuff like that – DarthRubik Jun 04 '16 at 21:47
  • @DarthRubik check this out, it maybe a better solution: http://llvm.org/releases/3.1/docs/FAQ.html#translatecxx – SHR Jun 04 '16 at 21:51
  • @SHR They removed that in the next release of llvm, because it did not really work that great....but yes that was my first stop – DarthRubik Jun 04 '16 at 21:54
  • @DarthRubik any disassembler will have the same problems. I think the safest way is to convert the code manually. means generate file for each class, use structs instead of classes, pointer to functions for virtuals, static for privates, etc. – SHR Jun 04 '16 at 22:12
  • 1
    There is no well-defined mapping from assembly to C. You have to be specific about what assembly gets "dissassembled" by what tool. – Baum mit Augen Jun 04 '16 at 23:27
  • 1
    And in general: Just don't use programming languages that are not implemented on your target platform. – Baum mit Augen Jun 04 '16 at 23:28
  • The problem is not so much the code (you could simply use the assembler direct without converting that to C) its the fact that the `C++` code will expect the `C++` runtime system and standard library to link against. – Galik Jun 04 '16 at 23:59

1 Answers1

3

Yes, it's indeed possible in the way you describe. No, it won't be portable to any CPU architecture, OS, and compiler triplet other than yours.

Let's see why. Take some basic C++ code...

#include <iostream>

int main()
{
    std::cout << "Hello, world!\n";
    return 0;
}

Let's turn this into assembler using g++, on a x86-64 Linux box (I turned optimizations on, and discarded debug symbols, in purpose)...

$ g++ -o test.s -O3 -S test.cpp

And the result is...

    .file   "test.cpp"
    .section    .rodata.str1.1,"aMS",@progbits,1
.LC0:
    .string "Hello, world!\n"
    .section    .text.unlikely,"ax",@progbits
.LCOLDB1:
    .section    .text.startup,"ax",@progbits
.LHOTB1:
    .p2align 4,,15
    .globl  main
    .type   main, @function
main:
.LFB1027:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $.LC0, %esi
    movl    $_ZSt4cout, %edi
    call    _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
    xorl    %eax, %eax
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc
.LFE1027:
    .size   main, .-main
    .section    .text.unlikely
.LCOLDE1:
    .section    .text.startup
.LHOTE1:
    .section    .text.unlikely
.LCOLDB2:
    .section    .text.startup
.LHOTB2:
    .p2align 4,,15
    .type   _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1032:
    .cfi_startproc
    subq    $8, %rsp
    .cfi_def_cfa_offset 16
    movl    $_ZStL8__ioinit, %edi
    call    _ZNSt8ios_base4InitC1Ev
    movl    $__dso_handle, %edx
    movl    $_ZStL8__ioinit, %esi
    movl    $_ZNSt8ios_base4InitD1Ev, %edi
    addq    $8, %rsp
    .cfi_def_cfa_offset 8
    jmp __cxa_atexit
    .cfi_endproc
.LFE1032:
    .size   _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
    .section    .text.unlikely
.LCOLDE2:
    .section    .text.startup
.LHOTE2:
    .section    .init_array,"aw"
    .align 8
    .quad   _GLOBAL__sub_I_main
    .local  _ZStL8__ioinit
    .comm   _ZStL8__ioinit,1,1
    .hidden __dso_handle
    .ident  "GCC: (GNU) 5.3.1 20151207 (Red Hat 5.3.1-2)"
    .section    .note.GNU-stack,"",@progbits

That mess is the price we pay for exception handling, templates, and namespaces. Let's disassemble this into C by hand, discarding the exception handling tables for a cleaner view...

/* std::ostream */
typedef struct
{
    /* ... */
} _ZSo;

/* extern operator<<(std::basic_ostream&, const char*); */
extern _ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc();

/* namespace std { extern ostream cout; } */
extern _ZSo _ZSt4cout;

/* Our string, of course! */
static const char* LC0 = "Hello, world!\n";

int main()
{
    _ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(&_ZSt4cout, LC0);
    return 0;
}

Seems unreadable, yet portable, right? Right now, it is. But, this code won't work! You need to define _ZSo (std::ostream) yet in C struct's terms, not to mention including all the exception handling stuff.

Things get near impossible to get portable once you start using try/throw/catch. I mean, there's absolutely no way that __cxa_allocate_exception or __cxa_throw will ever be portable! Your only way around this is to rewrite all your program (and that includes all libraries you use, even the standard library!) to use the slower setjmp/longjmp approach to exceptions instead of zero-cost exception handling.

Finally, but not least, a non-human disassembler will most likely fail at doing this properly, even at the simplest of inputs. Remember that, at the lowest levels of stuff (a.k.a machine code and assembly language), there's no concept of types. You can't never know if a register is a signed integer, and unsigned integer, or basically anything else, for instance. The compiler can also play with the stack pointer at its will, worsening the job.

Once compilation is over, the compiler, not expecting any future disassembling, wipes out all this precious information, because you normally don't need it at run-time. In most cases, disassembling is really not worth it, if even possible, and most likely you're seeking for a different solution. Translating from a higher-middle-level language to a lower-level language to a lower-middle-level language takes this to an extreme, and approaches the limits of what can be translated to what else.

3442
  • 8,248
  • 2
  • 19
  • 41
  • What if I don't use any libraries and write everything myself (no printfs or couts)? – DarthRubik Jun 05 '16 at 00:25
  • @DarthRubik: The implementation of the standard library, by definition, is non-portable. How would you implement `printf` itself in a completely portable manner? There's no way. That's why the standard libraries exist in the first place; they provide a common, portable interface to all different platforms. Anyways, you'ld still have to deal with exception handling *et al* that stuff. – 3442 Jun 05 '16 at 00:30
  • This is for a micro controller (so I would not have any printfs at all) – DarthRubik Jun 05 '16 at 00:30
  • And being on a micro controller I do not throw exceptions at all ever – DarthRubik Jun 05 '16 at 00:31
  • @DarthRubik: Oh, great. So, if you're in an embedded environment, you shouldn't care about being portable that much. In that case, just rewrite the code by hand. You probably don't even have a C standard library in there if there's no C++ compiler, don't you? – 3442 Jun 05 '16 at 00:35
  • But I am lazy and want a computer to do it for me (rats) – DarthRubik Jun 05 '16 at 00:35
  • @DarthRubik: Wait some decades for proper AI to develop, or rewrite the code yourself. No other way is feasible. – 3442 Jun 05 '16 at 00:36