4

Both the Rust reference and the Rustonomicon clearly state that "producing" a dangling reference is Undefined Behaviour.

Take this code snippet for example:

fn main() {
    let p: std::ptr::NonNull<u8> = std::ptr::NonNull::dangling();
    #[allow(unused_variables)]
    let r: &u8 = unsafe { std::mem::transmute::<_, _>(p) };
}

Running it through Miri in the playground produces:

error: Undefined Behavior: constructing invalid value: encountered a dangling reference (address 0x1 is unallocated)
 --> src/main.rs:4:27
  |
4 |     let r: &u8 = unsafe { std::mem::transmute::<_, _>(p) };
  |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ constructing invalid value: encountered a dangling reference (address 0x1 is unallocated)
  |
  = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
  = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
  = note: BACKTRACE:
  = note: inside `main` at src/main.rs:4:27

Of course, the Rust language is free to define as much stuff as UB as the creators want. What I am wondering is what the advantage of this Undefined Behaviour rule is (commonly, UB is introduced in a language to allow optimizations or simplify things).

Yet, according to the reference as linked above, the Undefined Behaviour may already be triggered by just writing a dangling reference to a local,and I fail to come up with a use for that.

So:

  • How can producing (not dereferencing) a dangling reference in Rust cause UB?
  • And can that UB be observed, without dereferencing the reference?
Jonas Schäfer
  • 20,140
  • 5
  • 55
  • 69
  • "and can it be observed?" while in practice it's may be possible, the very definition of UB mean it can't. That like schrödinger's cat. – Stargateur Sep 29 '22 at 21:04
  • Fair enough, yet oftentimes you can make compilers make UB observable, even if potentially only on assembly output and potentially only on specific compiler versions or architectures. – Jonas Schäfer Oct 01 '22 at 09:55

3 Answers3

13

The fact that just creating an invalid reference is UB means that an invalid reference cannot exist. This is important for optimizations like loop invariant code motion.

Take the following code as an example:

fn foo(arr: &[u32], cond: &bool) -> u32 {
    let mut result = 0;
    for &v in arr {
        if *cond {
            result += v;
        }
    }
    result
}

A compiler can optimize this by hoisting the condition to the beginning, allowing further optimizations such as auto-vectorization:

fn foo(arr: &[u32], cond: &bool) -> u32 {
    let mut result = 0;
    if *cond {
        for &v in arr {
            result += v;
        }
    }
    result
}

But if *cond can be dangling, this optimization is invalid: *cond can trigger a segfault for example, and if the array is empty, this wouldn't happen with the original version.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • That's an interesting example, but wouldn't it be enough to say that *dereferencing* a dangling reference is UB to make this optimization work? If I compare to C, where signed integer overflow is UB, compilers also just go ahead and assume that it does not happen and optimize based on that. So the example above should be allowed even if only dereferencing (as opposed to constructing) a dangling ref was UB. – Jonas Schäfer Oct 01 '22 at 09:42
  • 1
    @JonasSchäfer If the array is empty, you don't dereference the reference at all. – Chayim Friedman Oct 01 '22 at 22:25
  • For what it's worth, Rust does not actually pull out the loop invariant in this case, and the emitted code is very similar for the equivalent C code: https://godbolt.org/z/oqh3jqcvK – Sven Marnach Oct 13 '22 at 12:35
  • @SvenMarnach it does in higher optimization levels by the way. Try -C opt-level=3. -O is [level 2](https://doc.rust-lang.org/rustc/command-line-arguments.html#-o-optimize-your-code). Clang also does it on -O3. It is rather strange that it isn't enabled by default though, I can't see any drawbacks to enabling it. – Ekrem Dinçel Aug 03 '23 at 13:52
6

References must always be valid.

This is a much easier rule to reason about and work with than "references may or may not be valid but it is undefined behavior to dereference an invalid reference." There is also no benefit to allowing such behavior either; if you want to refer to a value may or may not be valid there are raw pointers or MaybeUninit for that.

There a principle that "safe Rust cannot cause undefined behavior" (or the reflexive "if a set of parameters could cause undefined behavior, then it must be marked unsafe"). Hopefully you can understand why people want that to be true and will strive to guarantee it. If such a principle did not exist, then this function could not guarantee it doesn't cause undefined behavior:

fn get(ref: &u8) -> u8 {
    *ref
}

Undefined behavior and unsafe have a very messy set of rules to follow to ensure you get it right. But the rules and consequences should not leak into non-unsafe code in Rust.

In all practicality, I can't think of a case where simply creating a reference should cause a sane compiler to cause a problem, since if it is created but not used then it'd be dead-code eliminated (but then again, compilers aren't sane). So, I don't think the rule is for a technical reason, but for a pragmatic one.

kmdreko
  • 42,554
  • 6
  • 57
  • 106
  • 1
    It isn't always as easy as that. For example, [`Pin::new_unchecked`](https://doc.rust-lang.org/std/pin/struct.Pin.html#method.new_unchecked) states: "If the constructed Pin

    does not guarantee that the data P points to is pinned, that is a violation of the API contract and may lead to undefined behavior in later (safe) operations."

    – Niklas Mohrin Sep 30 '22 at 10:50
  • I do like the simplicity argument. I had not considered that this might be a pragmatic reason more than a technical one. Do you happen to have anything citable for that? – Jonas Schäfer Oct 01 '22 at 09:52
0

I am not an expert of Rust's internals, so I don't know how to observe the UB, but the reason most of these situations are considered UB is because the compiler might do some optimizations that rely on these situations not happening. There is no problem as is, if the code produced is akin to the one you wrote, but the compiler might transform it a lot for various reasons. In particular, the compiler might choose to "interpret" the value pointed to by the pointer at any point for internal reasons, which will of course not work, and might produce anything.

jthulhu
  • 7,223
  • 2
  • 16
  • 33