0

Given a struct like so:

pub struct MyStruct<'a> {
    id: u8,
    other: &'a OtherStruct,
}

I want to partially initialize it with an id field, then assign to other reference field afterwards. Note: For what I'm showing in this question, it seems extremely unnecessary to do this, but it is necessary in the actual implementation.

The rust documentation talks about initializing a struct field-by-field, which would be done like so:

fn get_struct<'a>(other: &'a OtherStruct) -> MyStruct<'a> {
    let mut uninit: MaybeUninit<MyStruct<'a>> = MaybeUninit::uninit();
    let ptr = uninit.as_mut_ptr();

    unsafe {
        addr_of_mut!((*ptr).id).write(8);
        addr_of_mut!((*ptr).other).write(other);
        uninit.assume_init()
    }
}

Ok, so that's a possibility and it works, but it it necessary? Is it safe to instead do the following, which also seems to work?

fn get_struct2<'a>(other: &'a OtherStruct) -> MyStruct<'a> {
    let mut my_struct = MyStruct {
        id: 8,
        other: unsafe { MaybeUninit::uninit().assume_init() },
    };

    my_struct.other = other;
    my_struct
}

Note the first way causes no warnings and the second one gives the following warning...

other: unsafe { MaybeUninit::uninit().assume_init() },
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                |
                this code causes undefined behavior when executed
                help: use `MaybeUninit<T>` instead, and only call `assume_init` after initialization is done

...which makes sense because if the other field were accessed that could cause problems.

From having almost no understanding of this, I'm guessing that for the second way it's initially defining a struct that has its other reference pointing at whatever location in memory, but once a valid reference is assigned it should be good. Is that correct? I'm thinking it might matter for situations like if there was a struct or enum that wasn't initialized due to compiler optimizations so wrapping in MaybeUninit would prevent those optimizations, but is it ok for a reference? I'm never accessing the reference until it's assigned to.

Edit: Also, I know this could also be solved by using an Option or some other container for initialization in the private API of the struct, but let's skip over that.

David Sherret
  • 101,669
  • 28
  • 188
  • 178
  • The second option you gave is undefined behavior. It is always UB to have an invalid reference, even if it's just temporarily or you don't touch it. – ddulaney May 08 '21 at 17:10
  • @ddulaney thanks! I guess I'm trying to figure out what could go wrong here with this undefined behaviour. – David Sherret May 08 '21 at 17:23
  • explain why you need a structure with a reference to an other structure but can't be init properly would be way better than ask if something is UB when it's obviously UB, your question look like "please someone tell me it's ok cause I already do this" :p. I would bet you have a xy problem – Stargateur May 08 '21 at 17:47
  • @Stargateur it's a separate topic from the question—the code is creating a graph with circular references and I am aware there are other ways of achieving this. I’m just curious why. – David Sherret May 08 '21 at 17:49
  • that why circular list are impossible in ocaml or haskell, use raw pointer I guess – Stargateur May 08 '21 at 18:02

1 Answers1

2

It's undefined behavior, (What Every C (Rust using unsafe also) Programmer Should Know About Undefined Behavior):

Behavior considered undefined

  • A reference or Box that is dangling, unaligned, or points to an invalid value.

Note:

Undefined behavior affects the entire program. For example, calling a function in C that exhibits undefined behavior of C means your entire program contains undefined behaviour that can also affect the Rust code. And vice versa, undefined behavior in Rust can cause adverse affects on code executed by any FFI calls to other languages.

Dangling pointers

A reference/pointer is "dangling" if it is null or not all of the bytes it points to are part of the same allocation (so in particular they all have to be part of some allocation). The span of bytes it points to is determined by the pointer value and the size of the pointee type (using size_of_val). As a consequence, if the span is empty, "dangling" is the same as "non-null". Note that slices and strings point to their entire range, so it is important that the length metadata is never too large. In particular, allocations and therefore slices and strings cannot be bigger than isize::MAX bytes.

The reference book

Stargateur
  • 24,473
  • 8
  • 65
  • 91
  • I'm more asking what could go wrong here with this undefined behaviour in this scenario. – David Sherret May 08 '21 at 18:50
  • 2
    literally anything could happen, when you enter in UB state, your entire program is in undefined behavior. It's pointless to speculate on the "effect" an undefined behavior could have. For example, a common joke is to say, that once you do a UB the program erase all data on the computer. Please read https://stackoverflow.com/a/4105123/7076153 I linked. – Stargateur May 08 '21 at 19:09
  • I get that undefined behaviour can cause anything to happen and I get how all the other scenarios you've linked to can lead to undefined behaviour, but I guess more my question is what's the danger here and what's going on in this scenario? So a reference is made to somewhere unknown in memory and then what's possibly going on behind the scenes to potentially cause issues? – David Sherret May 08 '21 at 23:52
  • @DavidSherret so you didn't get it ^^ UB is probably what people who never did language like C understand the less, UB is UB. – Stargateur May 09 '21 at 09:08
  • @DavidSherret if it would be _defined_ (i.e. we would know what must happen) then by definition it would not be _undefined_ behaviour, would it? UB literally means, anything could happen: your computer can grow legs and walk out of your apartment. Not sure which part you don't get: the only way to prevent it happening is to avoid your program having UB in the first place. You should likely read more about compiler designs, portability, standards, and platform specifications and evolution, etc. in case you wish to understand what _could_ happen on each platform. – Peter Varo May 09 '21 at 09:28