Inline 64bit Assembly in 32bit GCC C Program

Question

I'm compiling a 32 bit binary but want to embed some 64 bit assembly in it.

void method() {
   asm("...64 bit assembly...");
}

Of course when I compile I get errors about referring to bad registers because the registers are 64 bit.

evil.c:92: Error: bad register name `%rax'

Is it possible to add some annotations so gcc will process the asm sections using the 64bit assembler instead. I have a workaround which is compile separately, map in a page with PROT_EXEC|PROT_WRITE and copy in my code but this is very awkward.

Well, the answer below is correct, but you can work around this. If you assemble the 64 bit instructions separately and extract the bytes from the object file, you can embed them directly within the code as bytes. You simply need to allocate memory, mark it executable and put the bytes there with a function pointer aiming at it. What I do not know is if you can feasibly get that code to execute if you are on an x86_64 architecture OS. :) — David Hoelzer, Dec 26 '15 at 23:54
Related: [Is it possible to use both 64 bit and 32 bit instructions in the same executable in 64 bit Linux?](https://stackoverflow.com/questions/48854564/is-it-possible-to-use-both-64-bit-and-32-bit-instructions-in-the-same-executable). Yes, you can far jump to a new `cs`, but it's not well supported, and hard to imagine any use-case. — Peter Cordes, Feb 19 '18 at 16:25

score 4 · Accepted Answer · answered Dec 25 '15 at 22:52

4

No, this isn't possible. You can't run 64-bit assembly from a 32-bit binary, as the processor will not be in long mode while running your program.

Copying 64-bit code to an executable page will result in that code being interpreted incorrectly as 32-bit code, which will have unpredictable and undesirable results.

answered Dec 25 '15 at 22:52

the process is in long mode when it runs this code because it was switched to run in long mode. but i think that is useful answer for me because it gives a good explanation about why the compiler would not support this – benmmurphy Dec 25 '15 at 22:54
2

If your program is a 32-bit binary, it is not running in long mode. Simply being on a 64-bit processor or operating system isn't sufficient; the whole process has to be 64-bit to use x86-64 instructions. – Dec 25 '15 at 23:00
1

@benmmurphy You can switch to long mode with a far jmp or call to a 64-bit code segment, but if that's what you're doing it makes no sense to use inline assembly to write the 64-bit code. You should write in a separate assembly file, which you can assemble in 64-bit mode to produce 64-bit machine code which you can then include (somehow) as a binary blob in your 32-bit code. Mind you doing this isn't supported as far as I know on any 64-bit OS, so it could crash if a hardware interrupt, page fault or other exception happens while executing 64-bit code in the context of a 32-bit process. – Ross Ridge Dec 26 '15 at 00:53
@RossRidge: Can you really do that in an unprivileged process (on Linux, for example)? And I assume it would only be possible with a 64bit OS, with the CPU in "compatibility mode", not "legacy mode". I really only know about user-space, with a pretty vague understanding of exactly how a kernel does the mode switching. The crash would happen after the kernel returned to the 64bit code with the CPU in 32bit mode, right? Since it still probably checks a flag in the `task_struct` (which wouldn't be affected by a user-space-only switch to long mode), rather than saving/restoring the mode. – Peter Cordes Dec 26 '15 at 09:17
So maybe it would be possible to write an OS where processes could switch between 32 and 64bit mode without any syscall or privileged operations. I understand your point that normal OSes like Linux and Windows don't work that way, I'm just curious if AMD's design left that possibility open. @benmmurphy: it wouldn't be very useful though. If you want the advantages of long mode, but with the memory usage benefits of having only 32bit pointers, use the x32 ABI (64bit mode using only the low 4GiB of virtual address space. Linux supports system calls taking 32bit pointers and so on). – Peter Cordes Dec 26 '15 at 09:22
1

@PeterCordes Long mode has two sub modes, compatibility mode (which provides compatibility for 16-bit and 32-bit protected mode programs) and 64-bit mode. The sub-mode is selected by a bit (L) in the current code segment descriptor, just like how another bit (D/B) selects the default operand and address size in 16/32-bit protected (and compatibility) mode. Since 64-bit OSes all have a ring 3 64-bit code selector in the GDT for 64-bit processes to use there's nothing stopping a process in compatibility mode using a far jump to this segment to switch to 64-bit mode. Or vice versa. – Ross Ridge Dec 26 '15 at 16:17
1

@PeterCordes I don't know for sure this won't work under Linux or Windows. I don't think either officially documents which selectors are the 64-bit and 32-bit ring 3 code selectors, so it doesn't appear to officially supported. Under Windows, WOW64 is implemented as 64-bit code that runs in context of 32-bit processes, but I don't know whether this means it will work for other code. On ARM platforms Microsoft only allows Thumb code by forcing task switches to resume in Thumb mode. – Ross Ridge Dec 26 '15 at 16:39
1

@RossRidge: Thanks, that's the kind of explanation I was hoping for. I didn't realize there'd already be existing descriptors that you could just use without having to write them (impossible for unprivileged processes). Even if the right number to actually use isn't documented or anything, that still makes it theoretically possible, which is all I was interested in. It's probably not practically useful for anything in Linux / Unix / OSX / Windows, even if it was supported, since if you want to run any 64bit code, you might as well run *all* 64bit code. – Peter Cordes Dec 26 '15 at 18:33
1

@PeterCordes Both Windows and Linux have APIs for user-mode programs to create LDT entries, but they wouldn't be useful here. The Windows one (NtSetLdtEntries) is undocumented and doesn't work on 64-bit Windows. The Linux one (modify_ldt) is documented, works on both 32-bit and 64-bit, but doesn't support setting the L bit. – Ross Ridge Dec 26 '15 at 19:21
1

oh some context for anyone that was interested. this was for a kernel exploit that i originally thought could only be triggered in 32 bit mode (OS would usually be 64 bit). and i was looking for a nice way to embed shellcode in my binary. it turns out the vulnerability could be triggered from a 64 bit process as well. – benmmurphy Jul 21 '16 at 17:25

Peter Cordes · Answer 2 · 2017-09-07T09:16:42.123

Don't try to put 64-bit machine-code inside a compiler-generated function. It might work since the encoding for function prologue/epilogue is the same in 32 and 64-bit, but it would be cleaner to just have a separate block of 64-bit code.

The easiest thing is probably to assemble that block in a separate file, using GAS .code64 or NASM BITS 64 to get 64-bit code in an object file you can link into a 32-bit executable.

You said in a comment you're thinking of using this for a kernel exploit against a 64-bit kernel from a 32-bit user-space process, so you just need some code bytes in an executable part of your process's memory and a way to get a pointer to that block. This is certainly plausible; if you can gain control of the kernel's RIP from a 32-bit process, this is what you want, because kernel code will always be running in long mode.

If you were doing something with 64-bit userspace code in a process that started in 32-bit mode, you could maybe far jmp to the block of 64-bit code (as @RossRidge suggests), using a known value for the kernel's __USER_CS 64-bit code segment descriptor. syscall from 64-bit code should return in 64-bit mode, but if not, try the int 0x80 ABI. It always returns to the mode you were in, saving/restoring cs and ss along with rip and rflags. (What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?)

.rodata is part of the test segment of your executable, so just get the compiler to put bytes in a const array. Fun fact: const int main = 195; compiles to a program that exits without segfaulting, because 195 = 0xc3 = the x86 encoding for ret (and x86 is little-endian). For an arbitrary-length machine-code sequence, const char funcname[] = { 0x90, 0x90, ..., 0xc3 } will work. The const is necessary, otherwise it will go in .data (read/write/noexec) instead of .rodata.

You could use const char funcname[] __attribute__((section(".text"))) = { ... }; to control what section it goes in (e.g. .text along with compiler-generated functions), or even a linker script to get more control.

If you really want to do it all in one .c file, instead of using the easier solution of a separately-assembled pure asm source:

To assemble some 64-bit code along with compiler-generated 32-bit code, use the .code64 GAS directive in an asm statement *outside of any functions. IDK if there's any guarantee on what section will be active when gcc emits your asm how gcc will mix that asm with its asm, but it won't put it in the middle of a function.

asm(".pushsection .text \n\t"   // AFAIK, there's no guarantee how this will mix with compiler asm output
    ".code64            \n\t"
    ".p2align 4         \n\t"
    ".globl my_codebytes  \n\t" // optional
    "my_codebytes:      \n\t"
    "inc %r10d          \n\t"
    "my_codebytes_end:  \n\t"
    //"my_codebytes_len: .long  . - my_codebytes\n\t"  // store the length in memory.  Optional
    ".popsection        \n\t"
#ifdef __i386
    ".code32"      // back to 32-bit interpretation for gcc's code
    // "\n\t inc %r10"  // uncomment to check that it *doesn't* assemble
#endif
    );

#ifdef __cplusplus
extern "C" {
#endif
   // put C names on the labels.
   // They are *not* pointers, their addresses are link-time constants
    extern char my_codebytes[], my_codebytes_end[];
    //extern const unsigned my_codebytes_len;
#ifdef __cplusplus
}
#endif
// This expression for the length isn't a compile-time constant, so this isn't legal C
//static const unsigned len = &my_codebytes_end - &my_codebytes;

#include <stddef.h>
#include <unistd.h>

int main(void) {
    size_t len = my_codebytes_end - my_codebytes;
    const char* bytes = my_codebytes;

    // do whatever you want.  Writing it to stdout is one option!
    write(1, bytes, len);
}

This compiles and assembles with gcc and clang (compiler explorer).

I tried it on my desktop to double check:

peter@volta$ gcc -m32 -Wall -O3 /tmp/foo.c
peter@volta$ ./a.out  | hd
00000000  41 ff c2                                          |A..|
00000003

This is the correct encoding for inc %r10d :)

The program also works when compiled without -m32, because I used #ifdef to decide whether to use .code32 at the end or not. (There's no push/pop mode directive like there is for sections.)

Of course, disassembling the binary will show you:

00000580 <my_codebytes>:
 580:   41                      inc    ecx
 581:   ff c2                   inc    edx

because the disassembler doesn't know to switch to 64-bit disassembly for that block. (I wonder if ELF has attributes for that... I didn't use any assembler directives or linker scripts to generate such attributes, if such a thing exists.)

score 2 · Answer 3 · answered Feb 18 '18 at 19:51

Switching between long mode and compatibility mode is done by changing CS. User mode code cannot modify the descriptor table, but it can perform a far jump or far call to a code segment that is already present in the descriptor table. In Linux the required descriptor is present (in my experience; this may not be true for all installations).

Here is sample code for 64-bit Linux (Ubuntu) that starts in 32-bit mode, switches to 64-bit mode, runs a function, and then switches back to 32-bit mode. Build with gcc -m32.

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>

extern bool switch_cs(int cs, bool (*f)());
extern bool check_mode();

int main(int argc, char **argv)
{
    int cs = 0x33;
    if (argc > 1)
        cs = strtoull(argv[1], 0, 16);
    printf("switch to CS=%02x\n", cs);

    bool r = switch_cs(cs, check_mode);

    if (r)
        printf("cs=%02x: 64-bit mode\n", cs);
    else
        printf("cs=%02x: 32-bit mode\n", cs);

    return 0;
}


        .intel_syntax noprefix
        .text

        .code32
        .globl  switch_cs
switch_cs:
        mov     eax, [esp+4]
        mov     edx, [esp+8]
        push    0
        push    edx
        push    eax
        push    offset .L2
        lea     eax, [esp+8]
        lcall   [esp]
        add     esp, 16
        ret

.L2:
        call    [eax]
        lret


        .code64
        .globl check_mode
check_mode:
        xor     eax, eax
        // In 32-bit mode, this instruction is executed as
        // inc eax; test eax, eax
        test    rax, rax
        setz    al
        ret

This answer, https://stackoverflow.com/a/48855022/8422330, shows switching in the opposite direction, from 64-bit mode to 32-bit mode. — prl, Feb 18 '18 at 19:52
Emm how did you find out it's 0x33? Is there a dynamic way of finding out perhaps? — Pyjong, Jan 13 '20 at 12:58
@Pyjong, if I remember correctly, I ran Linux as a guest and dumped the GDT from the hypervisor. Another way would be to look at the Linux source code to see how it fills in the GDT. — prl, Jan 14 '20 at 05:31
@Pyjong, you might be able to use `sgdt` and then read `/dev/kmem`. — prl, Jan 14 '20 at 05:32

Inline 64bit Assembly in 32bit GCC C Program

3 Answers3

Linked