0

I was wondering if it's possible to run direct machine code using inline assembly.

#include <iostream>
void code() {
    __asm (
        ".byte 0xB8, 0x01, 0x00, 0x00, 0x00"        // mov    eax,0x1
    );
}

int main() {
    code();
    return 0;
}

This code works fine but the problem is, I need to input the machine code as a string like this but it doesn't work.

std::string code = ".byte 0xB8, 0x0C, 0x00, 0x00, 0x00";
__asm (
   code
);

What am I doing wrong? Is this even possible? Thanks in advance.

BadUsernameIdea
  • 182
  • 1
  • 12
  • Since assembly and machine code are processor specific, which processor are you targeting? – Thomas Matthews Jul 15 '22 at 23:26
  • Why not use the assembly instruction text? – Thomas Matthews Jul 15 '22 at 23:28
  • 2
    I think `__asm` is for embedding machine code from assembly *at compile time*. To convert strings to machine code aat run time, you will require to pass them to an assembler (your own or external). – MikeCAT Jul 15 '22 at 23:28
  • What does the allegedly working code actually print, and what error message are you getting in the failing code? – Schol-R-LEA Jul 15 '22 at 23:29
  • I agree with @MikeCAT regarding this - in both 32-bit protected mode and 64-bit long mode, Linux treats code pages as execute-only, and trying to write to an executable page will usually cause a protection fault. – Schol-R-LEA Jul 15 '22 at 23:29
  • @BadUsernameIdea - what is the end-game for this code? It looks as if your use case is to apply self-modifying code, which as I said earlier isn't going to be readily possible under Linux (or Windows, or almost any other current OS). To give such a result would require you to essentially write a custom linkage editor and loader, and even then, the generated code would need to run in a separate process. This is something which would most often be seen in a compile-and-go interpreter, but even those generally don't generate machine code, usually sticking to an internal p-code instead. – Schol-R-LEA Jul 15 '22 at 23:35
  • 3
    You can't execute a `string` as if it were machine code. You would have to *parse* the `string` to *extract* the byte values into a block of memory that has execution rights, and then you can call that memory block as if it were a function like any other. – Remy Lebeau Jul 15 '22 at 23:47
  • 1
    @Schol-R-LEA: Using the return value of a `void` function won't compile. If it was declared as returning `int`, it would happen to work if you compiled without optimization (for x86-64), printing `1`. But it's UB to write a register in an `asm` statement without telling the compiler about it (without e.g. an `"eax"` clobber), and also UB in C++ to fall off the end of a non-`void` function, so at least with optimization enabled, I'd expect GCC or clang to include an illegal instruction on purpose so the program would fault instead of printing some garbage or crashing some other way. – Peter Cordes Jul 15 '22 at 23:49
  • @PeterCordes I was wondering about that part as well; that's why I asked the OP what the actual output was. Thank you for expressing the issue better than I could. – Schol-R-LEA Jul 15 '22 at 23:52
  • 1
    @Schol-R-LEA: Just for fun, I checked the asm: https://godbolt.org/z/YvWa6ExG6 - with optimization disabled, GCC and clang just warn about the fall-off-end UB. But instead of emitting a `ud2` like I was guessing, instead they just don't emit any instructions for that code path, not even the `ret` at the bottom of the function. i.e. they treat it like `__builtin_unreachable`, so execution would just fall into whatever's next if it leaves the `asm` statement! (Of course that was introduced by me changing the return type from `void`; with that there's no overload for `cout< – Peter Cordes Jul 15 '22 at 23:55
  • I suspect that this is exploiting a quality of the [_cdecl_ calling convention](https://en.wikipedia.org/wiki/X86_calling_conventions), namely that return values are passed through the `eax` register; the `ret` just pops the return value off the stack and and jumps to that return location, and has nothing to do with passing the return. If the compiler doesn't treat reading a `void` functions value as an error, then the `eax` value might indeed be treated as the return value. Since the x86-64 SystemV AMD64 API uses `rax` for the same purpose, it would still behave more or less the same. – Schol-R-LEA Jul 16 '22 at 00:14
  • It does leave the question of what type `cout` would be dispatched on, since as you said, there's no overload for `void`. – Schol-R-LEA Jul 16 '22 at 00:19
  • @Schol-R-LEA: Yes, leaving a value in EAX behind the compiler's back is exactly why it happens to work, if you can trick the compiler into not omitting the actual return instruction but still treating the return value as `int`. e.g. by disabling optimization and using `int code()`. 32-bit `int` *is* returned in EAX, so the fact that writing a 32-bit register like `mov eax, 1` zero-extends implicitly into the full RAX is irrelevant; x86-64 calling conventions require the caller to ignore high garbage in registers outside the actual type-width of the return value. – Peter Cordes Jul 16 '22 at 21:28
  • @Schol-R-LEA: I highly suspect the author of this question didn't actually test the thing they claim works fine, just they have some previous experience messing around with inline asm and know that string literals work in `asm()` statements (and in fact are required). But as I said before, that's not even a valid way to use inline asm, and would break with optimization; you need input/output constraints, or put it in a `__attribute__((naked))` function with your own `ret` if you want to use Basic asm (no constraints). https://gcc.gnu.org/wiki/ConvertBasicAsmToExtended – Peter Cordes Jul 16 '22 at 21:30
  • @PeterCordes I just checked back on the code and I did a major mistake. I corrected it in the update. Thanks for letting me spot the mistake. – BadUsernameIdea Jul 16 '22 at 21:47
  • @ThomasMatthews mostly AMD or x86_64 – BadUsernameIdea Jul 16 '22 at 21:48
  • 1
    Ok, that will at least compile, but without `asm("code here" ::: "eax")`, it's still undefined behaviour to actually run it; you're destroying the value in RAX without telling the compiler about it. Of course in practice with that caller, it won't actually break even after inlining, since `main` won't have anything valuable in any registers except RSP and maybe RBP at that point. – Peter Cordes Jul 16 '22 at 21:51
  • thanks @Schol-R-LEA and @PeterCordes, this was the answer I needed. So to be clear, This sort of machine code manipulation isn't possible by using string variables right? Also can I ask what the `:::` is supposed to do? – BadUsernameIdea Jul 16 '22 at 21:54

0 Answers0