1

I want to observe the difference in op code binary output of compilation between two versions of a very basic C++ program. For example, 2 + 2 = ?, with no libraries called. I expected the compiled output to be a tiny file of binary op codes with a few small headers, being new to compiled programs, but there are large headers.

simple.cpp

int main()
{
    unsigned int a = 2;
    unsigned int b = 2;
    unsigned int c = a + b;
}

compiler:

g++ -std=c++0x simple.cpp -o simple

Is there a format that I can export to that doesn't contain headers, just op code binary that we instruct the machine to execute? If not, what bytes or location in the resulting file can I look for to isolate the relevant logic from the program?

I need the machine code, not assembly, since my project is the analysis of differently obfuscated versions of a source file to attempt recognizing one based on the other. A complex subject with questionable feasibility, but nevertheless that's why I'm asking to isolate the machine code and not just the assembly - to test analysis against the true machine code outputs.

I tried googling the header structure but can't seem to find much info.

J.Todd
  • 707
  • 1
  • 12
  • 34
  • 2
    One quick way may be to use an online compiler and just look in the assembly window: https://godbolt.org/z/Wqvrch – Ted Lyngmo Oct 13 '20 at 10:31
  • @TedLyngmo Off the table unfortunately, the code is being written on an air-gapped system :3 Doing a side-project on the job and my machine with internet access isn't set up for coding. – J.Todd Oct 13 '20 at 10:37
  • There is a switch to make the compiler output assembler code. Is it `-S`? – Galik Oct 13 '20 at 10:41
  • You can always compile to object code using `-c` and `-o name.o` to write object code without linking. Of course, that won't stop link-time code generation optimization from further man-handling your object code whilst linking your final build from all your object code modules. You could also compile to asm if you're really in to that, I suppose. – WhozCraig Oct 13 '20 at 10:41
  • 2
    `g++ -masm=intel -S -std=c++0x simple.cpp -o-` should show something very similar to that you'll see in godbolt. – Ted Lyngmo Oct 13 '20 at 10:43
  • `g++ -std=c++0x simple.cpp -nostdlib -Wl,--oformat -Wl,binary -o simple` will have it output what is close to what you want, but there are some extra outputted data after the code of the function. – MikeCAT Oct 13 '20 at 10:48
  • @Galik that sounds useful, but my project is based on something that would actually require comparing the binary output of the opcodes. Assembly would be close, and probably helpful for trying to understand what's going on, but I do at some point need to isolate the instructions from the headers. – J.Todd Oct 13 '20 at 10:49
  • 2
    note that compilers are not stupid 1:1 translating machines. Once you turn on optimizations your whole `main` will be transformed to a NOOP, because there is no observable behavior. Either you turn on optimizations, then you wont see what you expect or you dont, then what you can conclude from your findings is of limited use – 463035818_is_not_an_ai Oct 13 '20 at 10:51
  • @MikeCAT is there some documentation or anything you can point me to so I can understand what the extra is so I can identify / remove it? – J.Todd Oct 13 '20 at 10:51
  • Does this answer your question? [Using GCC to produce readable assembly?](https://stackoverflow.com/questions/1289881/using-gcc-to-produce-readable-assembly) – Botje Oct 13 '20 at 10:52
  • @idclev463035818 thanks for the heads up. But since I'm looking at analyzing the results of obfuscation techniques, I think the attackers doing the obfuscating are going to usually be turning off those optimizations to allow better code obfuscation. – J.Todd Oct 13 '20 at 10:52
  • See [gcc(1): GNU project C/C++ compiler - Linux man page](https://linux.die.net/man/1/gcc) for `--nostdlib` and `-Wl`. See [ld(1): GNU linker - Linux man page](https://linux.die.net/man/1/ld) for `--oformat binary`. – MikeCAT Oct 13 '20 at 10:53
  • @Botje I'm trying to do automated processing on obfuscated binaries and attempt to identify an obfuscated binary from a differently obfuscated version. Obviously a complex topic and we'll avoid going into whether or not it's feasible, but that should explain why I need to be able to isolate the actual machine code for analysis / comparison. – J.Todd Oct 13 '20 at 10:56
  • @J.Todd: Other way around. Obfuscuators are going to turn the optimization to 11 precisely for the reasons mentioned by idclev. For starters, any code that's been optimized out does not need obfuscuation at all. – MSalters Oct 13 '20 at 11:03
  • 1
    @MSalters Obfuscators are going to want to throw in, among other things, dead code to hide their signature, and likely excessive `goto`s, both of which seem like a compiler using optimization would be quick to remove. I don't understand why you'd say code that's been optimized doesn't need obfuscation. If an attacker hits you with a payload and it's optimized out, without obfuscation you'll easily recognize that payload in your IDS next time they attempt an attack.. Optimization is the opposite of obfuscation, it makes the resulting binary more predicatable and harder to meaningfully change. – J.Todd Oct 13 '20 at 11:09
  • @MikeCAT if you submit that comment as an answer I'll accept. The assembly outputs (thanks to Ted Lyngmo) helped me understand what the extra at the end was that you're referring to as well. – J.Todd Oct 13 '20 at 11:35

1 Answers1

1

Seeing ld(1): GNU linker - Linux man page, you will find that you can use --oformat=output-format option to specify output format.

binary is a format that don't have headers.

Then, seeing gcc(1): GNU project C/C++ compiler - Linux man page, you will find that you can use -Wl option to pass options to the linker. -nostdlib option is also useful to avoid extra things added.

Combining these, you can try this command:

g++ -std=c++0x simple.cpp -nostdlib -Wl,--oformat=binary -o simple
MikeCAT
  • 73,922
  • 11
  • 45
  • 70