3

I'm writing an OS. I use asm! macro to change the code segment. (Details on 2020/06/08 changes to asm! here and in Rust RFC 2873)

pub unsafe fn set_code_segment(offset_of_cs: u16) {
    asm!("push {0:r}       // 64-bit version of the register
    lea rax, 1f            // or more efficiently, [rip + 1f]
    push rax
    retfq
    1:", in(reg) offset_of_cs);
}

This works. However, if I use push 1f, the address of label 1: will not be pushed. Instead, it will be a memory source operand, loading from [1:]

So the following code

pub unsafe fn set_code_segment(offset_of_cs: u16) {
    asm!("push {0:r}
    push 1f           // loads from 1f, how to push the address instead?
    retfq
    1:", in(reg) offset_of_cs);
}

will not work. Disassembled (by ndisasm) code is this:

11103   │ 0000B9EC  57                push rdi
11104   │ 0000B9ED  FF3425F6B90080    push qword [0xffffffff8000b9f6]
11105   │ 0000B9F4  48CB              retfq

The desired code written in nasm syntax is this:

    [bits 64]

    extern set_code_segment

set_code_segment:
    push rdi
    push change_code_segment         ; absolute address as a 32-bit immediate
    retfq
change_code_segment:
    ret

Linked with the kernel (and extern "C" { pub fn set_code_segment(offset_of_cs: u16) -> () }), the code works. The address of change_code_segment will successfully be pushed.

So my question is: why push 1f of asm! pushes the content of address 1:, not the address of 1:?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
toku-sa-n
  • 798
  • 1
  • 8
  • 27
  • 1
    AFAIK, Rust's inline assembly draws some inspiration from GCC. So perhaps you need to use the `$` prefix to specify that you mean an immediate operand: `push $1f` – Michael Jul 20 '20 at 06:36
  • @Michael Thanks for your comment, but `push $1f` didn't change the situation... The disassembled code was the same. – toku-sa-n Jul 20 '20 at 06:46
  • Separate possible problem: You take the new CS as a `u16`; make sure you still use a qword push, not a push of a 16-bit register! `retf` still does two 64-bit pops. – Peter Cordes Jul 20 '20 at 11:30
  • For your actual problem, LEA + PUSH is what you want to do, so you can use a RIP-relative addressing mode for the LEA. In GNU `.intel_syntax`, it would be `lea rax, [RIP + 1f]`, with square brackets. IDK if that's the syntax Rust asm uses or not. – Peter Cordes Jul 20 '20 at 11:30
  • @PeterCordes Yes, actually I push `u16` into `rdi` and this works. Surprisingly, using a 16-bit register doesn't work. – toku-sa-n Jul 20 '20 at 12:05
  • `push di` only changes RSP by 2, not 8. IDK what part of that you find surprising, but the possible operand-sizes for `push` in 64-bit mode are word and qword. (16-bit push is almost never what you want.) I was worried that Rust *would* pick a 16-bit register since the Rust value was a `u16`; that's what GNU C inline asm would do for a `short` or `uint16_t`. But yes, fortunately it picks the full-width register for `{0:r}` – Peter Cordes Jul 20 '20 at 12:19
  • 1
    @PeterCordes `r` of `{0:r}` specifies the size of the register as 64-bit. What I surprised is that using a 16-bit register causes a general protection fault. – toku-sa-n Jul 20 '20 at 12:25
  • re: `push di`: like I explained, `retf` pops 16 bytes (and ignores the high 6 of the qword containing the segment selector). https://www.felixcloutier.com/x86/ret That will misalign the stack and break the rest of your code if you only pushed 8 + 2 instead of 8 + 8 bytes. Or are you saying retf *itself* causes a GPF? That seems unlikely. If you're curious, use a debugger to see which instruction actually GPFs after you mess up the stack. – Peter Cordes Jul 20 '20 at 12:30
  • In GAS `.intel_syntax`, the syntax for `push` with the address as an absolute 32-bit-sign-extended immediate would be `push OFFSET 1f`, but that doesn't seem to work in Rust https://godbolt.org/z/hznEbs :/ Neither does AT&T syntax `push $1f`, unsurprising because this isn't AT&T syntax. BTW, if you are going to LEA, use `[rip + 1f]` instead of absolute `1f`; it's smaller code size. – Peter Cordes Jul 20 '20 at 12:36
  • Have you tried AT&T syntax mode? https://github.com/Amanieu/rfcs/blob/inline-asm/text/0000-inline-asm.md says one exists, which might make `pushq $1f` work. Obviously it would be nicer to still use Intel syntax, although it seems to be a MASM-like flavour, like GAS / LLVM `.intel_syntax`, not NASM-like. – Peter Cordes Jul 20 '20 at 12:46
  • 1
    The docs I found explicitly say it's supposed to be using GAS / LLVM style `.intel_syntax noprefix`, so I think it's a bug that `push OFFSET 1f` doesn't work. I'm not sure the best place to report that. – Peter Cordes Jul 20 '20 at 12:55
  • 1
    @PeterCordes Thanks for lots of useful comments. I sent [a bug report](https://github.com/rust-lang/rust/issues/74558). – toku-sa-n Jul 20 '20 at 13:31

1 Answers1

3

The rust asm! macro is built upon llvm.

And there is a specific bug in llvm that interpret labels only composed of 0 and 1 digits, such as 0, 11 or 101010, as binary values. That's what's happening here, this binary value is read as an address in memory.

Also, the rust asm! documentation had been updated and now include a labels section.

  • A comment on that github issue suggested that using `2:` could be a workaround. But I tested (in a stand-alone `.s` with clang), and [that doesn't work either](https://github.com/rust-lang/rust/issues/74558#issuecomment-834382267). So I guess use `.Lfoobar:` and hope your asm doesn't expand twice in the same file, or look for Rust inline asm equivalent to GCC inline asm's `%=` auto-numbered thing for uniquifying things in asm template strings. [Inline assembly label already defined error](https://stackoverflow.com/q/31529224) – Peter Cordes May 07 '21 at 13:26