3

First: I know this is pointless, I'm going to write more complex stuff later (it's an assignment), this is a starting point.

Here is the code I wrote:

#![feature(asm)]
fn my_add(a: f32, b: f32) -> f32 {
    let x: f32;
    unsafe {
        asm!(
        "addss {0}, {1}",
        inlateout(xmm_reg) a => x,
        in(xmm_reg) b,
        );
    };
    x
}

This compiles to:

example::my_add:
        sub     rsp, 4

        addss   xmm0, xmm1

        movss   dword ptr [rsp], xmm0
        movss   xmm0, dword ptr [rsp]
        add     rsp, 4
        ret

Where does these movss come from ? Why is it writing to memory and then reading it again ?

Moreover, this only works for f32 and not for f64, and I can't see why: the return value is the value of the first argument I failed to notice that I was using addss instead of addsd for f64.

I built this with rustc 1.57.0-nightly (25ec82738 2021-10-05).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
DontBreakAlex
  • 172
  • 1
  • 9
  • 1
    In [the playground](https://play.rust-lang.org/?version=nightly&mode=release&edition=2018&gist=32a45c87f2e7244d95908bc7956dfdf4), in release mode, your `my_add` compiles to `push rax; addss xmm0, xmm1; pop rax; ret`. The moves are presumably because you're writing to `x` on the stack, which is not being optimised out in debug mode? – eggyal Oct 07 '21 at 12:27
  • 3
    "Moreover, this does not work, and I can't see why: the return value is the value of the first argument." is it not? In AMD64 the floating-point inputs are passed through the XMM registers, and returned via the same. If you look at the disassembly for `a + b` it's also just `addss xmm0, xmm1; ret` (except without thinking the asm! call is clobbering rax) – Masklinn Oct 07 '21 at 12:40
  • @Masklinn Thanks, my question seems silly now, because I confused addss with addsd, I tought that I had a wrong result because of the clobbering. Anyway, is it possible to avoid clobbering in debug mode ? (It obviously goes away in release) Can I not use a temporary variable like x ? – DontBreakAlex Oct 07 '21 at 13:24
  • @DontBreakAlex no idea, even compiling in release has `pushd rax` and `popd rax`. It's probably an issue with `asm!` itself: even removing everything else and leaving only `unsafe { asm!("nop") }` `rax` gets pushed and popped. – Masklinn Oct 07 '21 at 13:32
  • 3
    Add `options(pure, nomem, nostack)` to get rid of `push/pop` in release. Details: https://doc.rust-lang.org/unstable-book/library-features/asm.html#options – stepan Oct 07 '21 at 14:59
  • 2
    @stepan: Interesting. I guess that if it treats the `asm` like a function call, then it thinks it needs to align the stack before that call. So the `push` and `pop` are just to align the stack, and nothing to do with `rax` per se. – Nate Eldredge Oct 07 '21 at 16:04
  • Seems basically a duplicate of [Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?](https://stackoverflow.com/q/53366394) - un-optimized code always stores/reloads between statements, e.g. after this asm statement, before returning the value `x`. The other interesting part, push/pop to align the stack before the asm statement or to reserve >= 4 bytes of space instead of using the red-zone, was only discussed in comments, and the `addsd` bug has been removed from the question. – Peter Cordes Oct 07 '21 at 21:27
  • @stepan: Any explanation for the code in the question where the `asm!` runs with RSP misaligned? Note the `sub rsp, 4` that only reserves space for one `f32` float, instead of using the red-zone. https://godbolt.org/z/n79GEx58E shows push/pop even for the un-optimized case with f32, with the same version the question mentions: rustc 1.57.0-nightly (25ec82738 2021-10-05) which is current nightly on Godbolt. (I had to use `pub fn` so it's not optimized away, otherwise just copy/pasted the question's Rust source.) So maybe that quoted asm was from a different compiler version. – Peter Cordes Oct 07 '21 at 21:38
  • 1
    @PeterCordes yeah my guess is that the author actually had `options(...)` added at one point, because that's the code you get for debug version: https://godbolt.org/z/94Y6GPfxa – stepan Oct 07 '21 at 22:25

0 Answers0