Yes, it's indeed possible in the way you describe. No, it won't be portable to any CPU architecture, OS, and compiler triplet other than yours.
Let's see why. Take some basic C++ code...
#include <iostream>
int main()
{
std::cout << "Hello, world!\n";
return 0;
}
Let's turn this into assembler using g++
, on a x86-64 Linux box (I turned optimizations on, and discarded debug symbols, in purpose)...
$ g++ -o test.s -O3 -S test.cpp
And the result is...
.file "test.cpp"
.section .rodata.str1.1,"aMS",@progbits,1
.LC0:
.string "Hello, world!\n"
.section .text.unlikely,"ax",@progbits
.LCOLDB1:
.section .text.startup,"ax",@progbits
.LHOTB1:
.p2align 4,,15
.globl main
.type main, @function
main:
.LFB1027:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $.LC0, %esi
movl $_ZSt4cout, %edi
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
xorl %eax, %eax
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE1027:
.size main, .-main
.section .text.unlikely
.LCOLDE1:
.section .text.startup
.LHOTE1:
.section .text.unlikely
.LCOLDB2:
.section .text.startup
.LHOTB2:
.p2align 4,,15
.type _GLOBAL__sub_I_main, @function
_GLOBAL__sub_I_main:
.LFB1032:
.cfi_startproc
subq $8, %rsp
.cfi_def_cfa_offset 16
movl $_ZStL8__ioinit, %edi
call _ZNSt8ios_base4InitC1Ev
movl $__dso_handle, %edx
movl $_ZStL8__ioinit, %esi
movl $_ZNSt8ios_base4InitD1Ev, %edi
addq $8, %rsp
.cfi_def_cfa_offset 8
jmp __cxa_atexit
.cfi_endproc
.LFE1032:
.size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main
.section .text.unlikely
.LCOLDE2:
.section .text.startup
.LHOTE2:
.section .init_array,"aw"
.align 8
.quad _GLOBAL__sub_I_main
.local _ZStL8__ioinit
.comm _ZStL8__ioinit,1,1
.hidden __dso_handle
.ident "GCC: (GNU) 5.3.1 20151207 (Red Hat 5.3.1-2)"
.section .note.GNU-stack,"",@progbits
That mess is the price we pay for exception handling, templates, and namespaces. Let's disassemble this into C by hand, discarding the exception handling tables for a cleaner view...
/* std::ostream */
typedef struct
{
/* ... */
} _ZSo;
/* extern operator<<(std::basic_ostream&, const char*); */
extern _ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc();
/* namespace std { extern ostream cout; } */
extern _ZSo _ZSt4cout;
/* Our string, of course! */
static const char* LC0 = "Hello, world!\n";
int main()
{
_ZStlsIst11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(&_ZSt4cout, LC0);
return 0;
}
Seems unreadable, yet portable, right? Right now, it is. But, this code won't work! You need to define _ZSo
(std::ostream
) yet in C struct
's terms, not to mention including all the exception handling stuff.
Things get near impossible to get portable once you start using try
/throw
/catch
. I mean, there's absolutely no way that __cxa_allocate_exception
or __cxa_throw
will ever be portable! Your only way around this is to rewrite all your program (and that includes all libraries you use, even the standard library!) to use the slower setjmp
/longjmp
approach to exceptions instead of zero-cost exception handling.
Finally, but not least, a non-human disassembler will most likely fail at doing this properly, even at the simplest of inputs. Remember that, at the lowest levels of stuff (a.k.a machine code and assembly language), there's no concept of types. You can't never know if a register is a signed integer, and unsigned integer, or basically anything else, for instance. The compiler can also play with the stack pointer at its will, worsening the job.
Once compilation is over, the compiler, not expecting any future disassembling, wipes out all this precious information, because you normally don't need it at run-time. In most cases, disassembling is really not worth it, if even possible, and most likely you're seeking for a different solution. Translating from a higher-middle-level language to a lower-level language to a lower-middle-level language takes this to an extreme, and approaches the limits of what can be translated to what else.