8

Consider the following (on Rust >= 1.54):

pub fn assign_refs(i: &mut u32, j: &mut u32) -> u32 {
    *i = 42;
    *j = 7;
    *i
}

Per no aliasing among mutable refs, this compiles to:

        mov     dword ptr [rdi], 42
        mov     dword ptr [rsi], 7
        mov     eax, 42
        ret

Now consider:

pub fn assign_ptrs(i: *mut u32, j: *mut u32) -> u32 {
    *unsafe {&mut *i} = 42;
    *unsafe {&mut *j} = 7;
    unsafe {*i}
}

(Note only one mutable ref exists at once, so this is not undefined behavior if i == j).

Since pointers may alias, the last expression must reload:

        mov     dword ptr [rdi], 42
        mov     dword ptr [rsi], 7
        mov     eax, dword ptr [rdi]
        ret

This next example is undefined behavior if j points at i:

pub fn assign_undefined_behavior_if_same(i: &mut u32, j: *mut u32) -> u32 {
    *i = 42;
    *unsafe {&mut *j} = 7;  // UB if j points to i, second mut ref.
    *i
}

For that reason, it compiles to the same code as assign_refs, returning the "wrong" value.


My question is regarding:

pub fn assign_mixed(i: &mut u32, j: *mut u32) -> u32 {
    *i = 42;

    let i_ptr = i as *mut u32;
    std::convert::identity(i);  // *Not* a reborrow, a move and drop.

    // i no longer exists.
    // *i = 42; // use of moved value: `i`

    // At this point, why not the same as assign_ptrs?
    *unsafe {&mut *j} = 7;

    // Assumes that i_ptr is not aliased just because it came from a &mut?
    unsafe {*i_ptr}
}

This compiles to the same thing as assign_refs, and I find that surprising.

The unique aliasing reference i ends halfway through the function. At that point, why are i_ptr and j not treated identically as if we were in assign_ptrs? Pointers are allowed to alias, so j could point at i/i_ptr and i no longer exists.

For reference, one can call this like:

fn test() {
    let mut i = 0;

    let mut i_ref = &mut i;
    let i_ptr = i_ref as *mut u32;
    assign_mixed(i_ref, i_ptr);
}

Is this an over-aggressive noalias propagation?

Stargateur
  • 24,473
  • 8
  • 65
  • 91
GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • 1
    the mere fact to have two mut ref of the same object that exist is UB, if I recall. – Stargateur Sep 24 '21 at 03:57
  • `*unsafe {&mut *j} = 7;` I believe this create a mutable reference – Stargateur Sep 24 '21 at 04:00
  • `std::convert::identity(i); // *Not* a reborrow, a move and drop.` I believe this do nothing not even drop and I don't know if drop would allow what you want – Stargateur Sep 24 '21 at 04:01
  • Since rust didn't define clearly its aliasing model I advice to avoid any situation of aliasing where a reference exist. – Stargateur Sep 24 '21 at 04:04
  • @Stargateur Yes, two mut refs is undefined behavior. I am asking formally if this is expected to behave with two mut refs, and/or how to drop `i`. – GManNickG Sep 24 '21 at 04:04
  • @Stargateur I agree Rust is not as formally defined as it ought to be long term, but this is a reduced example from a very tight loop. If I can avoid a branch checking for self-assignment, that's a non-trivial gain. Rust is, ultimately, a systems language - I look forward to some more formal explanations. – GManNickG Sep 24 '21 at 04:05
  • I would advice to add your true loop here, I can probably find you a good solution, to find a start of answer I don't see any better than ask to https://github.com/RalfJung for example on rust issue github. This is a very very very very complex question. – Stargateur Sep 24 '21 at 04:10
  • But even if you `std::convert::identity(i)` or `drop(i)` inside `assign_mixed`, the `i_ref` from `test` still exists, thus assigning through the raw pointer is still UB. – rodrigo Sep 24 '21 at 07:11
  • @rodrigo The `i_ref` in test is reborrowed. In Rust, there can be many outstanding mut references to the same thing, but only on active at a time. If what you are saying is true, any nested `self.foo()` would be undefined behavior. See also: https://stackoverflow.com/a/45104627/87234 – GManNickG Sep 24 '21 at 07:38
  • @GManNickG: My argument is that the OP wants to create a gap between dropping the reborrow and reactivating the original reference. I don't think that is possible: either you cannot drop the reborrow or the original reference is immediately active, but I don't think you can be in a state where a value is borrowed, but all the borrows are inactive. – rodrigo Sep 24 '21 at 07:48
  • @rodrigo I agree, that seems to be the model as well. That is, when lowering to LLVM rust implements a `&mut` as a `noalias` pointer directly at the parameter list. Rather than, e.g., passing an (aliasing) pointer in the parameter list, then actually constructing the `&mut` from it - which would scope the noalias to the lifetime of the ref. For ~almost all cases that would match existing behavior, but would allow a mechanism to drop the &mut ref "down" to its underlying pointer (which would be safe, given the unique temporal ownership and that the caller won't activate until the call is done). – GManNickG Sep 24 '21 at 07:51
  • If the problem is that the pointer is treated differently because Rust and LLVM track the fact that it's coming from reference, then you need to find a way to launder the pointer, e.g. by sending its `usize` value to a black box that will return it unchanged. The best way to do that is open to research until Rust implements a stable black-box primitive, but for a quick test, [this seems to work](https://godbolt.org/z/5G14fE1sW). (The laundering technique is stolen from [`black_box`](https://docs.rs/bencher/0.1.5/src/bencher/lib.rs.html#590-596).) – user4815162342 Sep 24 '21 at 09:41

1 Answers1

5

Rust follow noalias model for &mut T from LLVM see Behavior considered undefined:

This indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function. The attribute on a return value also has additional semantics described below. The caller shares the responsibility with the callee for ensuring that these requirements are met.

So If I get it correctly that mean simply having i: &mut u32 in argument list expect it should have no alias. Even use an auxiliary function like:

pub unsafe fn assign_mixed(i: &mut u32, j: *mut u32) -> u32 {
    aux(i, j)
}

pub unsafe fn aux(i: *mut u32, j: *mut u32) -> u32 {
    *i = 42;
    *j = 7;
    *i
}

Would not work.

I think the only way to have something similar would be to use UnsafeCell like:

use std::cell::UnsafeCell;

pub unsafe fn assign_mixed(i: &UnsafeCell<u32>, j: *mut u32) -> u32 {
    *i.get() = 42;

    *j = 7;
    *i.get()
}

Would produce the desired assembler code. Be sure to not use &mut UnsafeCell for this.

Stargateur
  • 24,473
  • 8
  • 65
  • 91
  • This is a good explanation for why this has the behavior, good find! I suppose now it boils down to Rust lacking a mechanism to say "I am the only one with this &mut T right now, and I am done with it - give me an aliasing pointer for it as I relinquish it." – GManNickG Sep 24 '21 at 05:28
  • @GManNickG You're *not* the only one with a `&mut T` because *you got it from your caller*. You don't get to tell the caller that they're not allowed to reborrow. – trent Sep 24 '21 at 12:05
  • @trentcl: I didn't say they don't get to reborrow. I'm saying _my_ borrow is no longer going to be used. Like `ir_inner1` in: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=e2d34a0d355197cef3020b8fc0d8c851 – GManNickG Sep 24 '21 at 16:54
  • FWIW, I found https://github.com/rust-lang/unsafe-code-guidelines/issues/86 which shows almost an identical case of Rust intending to allow one thing, but noalias preventing it. Seems this is still an open problem. – GManNickG Sep 24 '21 at 17:04
  • @GManNickG You can relinquish "your" borrow, but that still doesn't give you permission to dereference `j` because `i` is derived (in the stacked borrows model) from a borrow taken in the caller, which *also* claims exclusive access over `*i` for its entire lifetime, and whose lifetime extends during the entire call to `assign_mixed`. In other words: if the caller does `let p = &mut x as *mut _; let r = &mut x; assign_mixed(r, p);` then "relinquishing" `i` only gives exclusive access back to `r`: it does not make it OK to dereference `p` because `r` is still exclusive. – trent Sep 24 '21 at 17:11
  • 1
    I don't believe the GH link is an example of the same issue: in that case the soundness depends on `raw_ptr` being accessed only as a shared reference, which is not the case for `j` in `assign_mixed`. Although I will admit, at first blush I thought "there's no way that should work" so perhaps I do have some reading up to do. [Stacked Borrows](https://plv.mpi-sws.org/rustbelt/stacked-borrows/paper.pdf) – trent Sep 24 '21 at 17:18
  • @trentcl I think I agree with that analysis. In the same way the shared ref causes the stacked borrows to permit additional new shared refs to be created, I am effectively asking for an operation to pop the stack and replace that with another &mut I got somewhere else. (`unsafe` of course, since I am promising it is the "same".) I admit this is niche and in the meantime have re-written my hot loop to just use pointers since it sidesteps everything. – GManNickG Sep 24 '21 at 17:23