0

In one of my CS classes, my teacher called me into a zoom meeting and accused me of decompiling my C code into AT&T x64 assembly language. I did not decompile my code, so I am very confused.

Is there a way to distinguish between hand-written code and decompiled code? I would like to know so that I don't accidentally code something in a way that appears to be decompiled...

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 7
    Decompiling converts assembly to C, but you seem to be talking about the reverse, which is simply "compiling". – Nate Eldredge Nov 05 '20 at 04:40
  • @NateEldredge: He could have ran it through the compiler, and then through a disassembler. – Joshua Nov 05 '20 at 04:44
  • 1
    @Joshua: That still wouldn't involve any *decompiling*. You could start with C source, compile to machine code or asm, decompile to C or C++, then compile that again... Or you could start with hand-written asm, decompile to C or C++, then use an optimizing compiler in an attempt to optimize your asm. – Peter Cordes Nov 05 '20 at 04:50
  • 4
    I'm curious to see your hand-written asm, and what signs your professor pointed to as indicators it was compiler output (which seems to be what you're saying). If it's hand-written, chances are I can point to multiple signs that GCC or clang would never have emitted it. The vast majority of student-written asm is full of easy but minor missed optimizations like using 64-bit operand size when only 32-bit is needed. (Although to be fair, GCC often fails to do value-range analysis when compiling source that unnecessarily uses `unsigned long` or whatever.) – Peter Cordes Nov 05 '20 at 04:57

1 Answers1

4

Yes, we can tell. When it doesn't matter, humans will organize stuff into logical chunks, so the logic happens in order, register assignment happens in order, etc. The compiler, on the other hand, assigns stuff haphazardly, when instruction order doesn't matter it emits instructions in pseudo-random order, etc.

But we can tell hand-written assembly code by somebody learning assembly from de-compiled code at a glance. The compiler will use too-advanced tricks that are just not taught in first-level courses. Basically, if we see something too advanced too early by somebody who isn't blowing away the homework and the exams, it was decompiled.

Historically, this was reversed and as late as 15 years ago, we were mocking the compiler's assembly as though written by a novice. No more. Now the compiler is the expert.

Joshua
  • 40,822
  • 8
  • 72
  • 132
  • 3
    You're describing *optimized* asm; yeah agreed. Especially for 3-operand ISAs, compilers will also rarely keep a variable in a consistent register. The signs of compiling with optimization disabled are even clearer: independent blocks of instructions to implement each C statement, never keeping anything in registers across such blocks. (Unless you use the `register` keyword with gcc or clang). [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) – Peter Cordes Nov 05 '20 at 04:52
  • 3
    Other things too: label names that are generic (`L12345`) instead of meaningful, directives like `.size`, code alignment, security features like stack canaries and `endbr`, etc. – Nate Eldredge Nov 05 '20 at 04:55
  • 1
    @NateEldredge: Naming, formatting, and directives can get rewritten by people that aren't totally naive, still keeping the actual instructions, so the actual instruction choices are more of a fingerprint vs. "clothing". But yes, a totally naive use of compiler output will have stuff like that and `.cfi` directives for every change to ESP / RSP unless they used [How to remove "noise" from GCC/clang assembly output?](//stackoverflow.com/q/38552116). To be fair, I have seen some hand-written code that didn't bother to invent meaningful label names, but usually not GCC or clang's *exact* style. – Peter Cordes Nov 05 '20 at 06:32