3

I'm writing some GPU code for macOS using the metal crate. In doing so, I allocate a Buffer object by calling:

let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared)

This FFIs to Apple's Metal API, which allocates a region of memory that both the CPU and GPU can access and the Rust wrapper returns a Buffer object. I can then get a pointer to this region of memory by doing:

let data = buffer.contents() as *mut u32

In the colloquial sense, this region of memory is uninitialized. However, is this region of memory "uninitialized" in the Rust sense?

Is this sound?

let num_bytes = num_u32 * std::mem::size_of::<u32>();
let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);
let data = buffer.contents() as *mut u32;

let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };

for i in as_slice {
  *i = 42u32;
}

Here I'm writing u32s to a region of memory returned to me by FFI. From the nomicon:

...The subtle aspect of this is that usually, when we use = to assign to a value that the Rust type checker considers to already be initialized (like x[i]), the old value stored on the left-hand side gets dropped. This would be a disaster. However, in this case, the type of the left-hand side is MaybeUninit<Box>, and dropping that does not do anything! See below for some more discussion of this drop issue.

None of the from_raw_parts rules are violated and u32 doesn't have a drop method.

  • Nonetheless, is this sound?
  • Would reading from the region (as u32s) before writing to it be sound (nonsense values aside)? The region of memory is valid and u32 is defined for all bit patterns.

Best practices

Now consider a type T that does have a drop method (and you've done all the bindgen and #[repr(C)] nonsense so that it can go across FFI boundaries).

In this situation, should one:

  • Initialize the buffer in Rust by scanning the region with pointers and calling .write()?
  • Do:
let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };

for i in as_slice {
  *i = unsafe { MaybeUninit::new(T::new()).assume_init() };
}

Furthermore, after initializing the region, how does the Rust compiler remember this region is initialized on subsequent calls to .contents() later in the program?

Thought experiment

In some cases, the buffer is the output of a GPU kernel and I want to read the results. All the writes occurred in code outside of Rust's control and when I call .contents(), the pointer at the region of memory contains the correct uint32_t values. This thought experiment should relay my concern with this.

Suppose I call C's malloc, which returns an allocated buffer of uninitialized data. Does reading u32 values from this buffer (pointers are properly aligned and in bounds) as any type should fall squarely into undefined behavior.

However, suppose I instead call calloc, which zeros the buffer before returning it. If you don't like calloc, then suppose I have an FFI function that calls malloc, explicitly writes 0 uint32_t types in C, then returns this buffer to Rust. This buffer is initialized with valid u32 bit patterns.

  • From Rust's perspective, does malloc return "uninitialized" data while calloc returns initialized data?
  • If the cases are different, how would the Rust compiler know the difference between the two with respect to soundness?
Rick Weber
  • 53
  • 6
  • [This thread on the Rust forum](https://users.rust-lang.org/t/uninitialized-memory-and-ffi-again/53693) seems relevant; FFI calls are outside the compiler's line of sight and thus this requires `unsafe` which puts the responsibility on *you* to do the correct thing. If you know the data is uninitialized and you say it is anyway then its UB, but one of the consequences of UB could be that it does what you expect since the compiler will assume that you've not introduced UB. – kmdreko Feb 16 '23 at 03:30
  • So, as I understand the discussion, the compiler is going to assume the data in the region created by `slice::from_raw_parts_mut()` is initialized. If the FFI code has actually initialized the data according to the type's semantics, then dropping and everything should work fine. However, if it hasn't, then I need to iterate through the region with a pointer and initialize it with [write](https://doc.rust-lang.org/std/ptr/fn.write.html). Then the region is initialized and I can call `slice::from_raw_parts`. – Rick Weber Feb 16 '23 at 05:06
  • For those of us who do not know Metal's API, it would be nice to know the signature of the mysterious `contents` method... – Matthieu M. Feb 16 '23 at 08:00
  • The rules of `slice::from_raw_parts()` _are_ violated; creating any reference requires the content of the memory to be initialized. – Chayim Friedman Feb 16 '23 at 10:53
  • @ChayimFriedman I see this now "data must point to len consecutive properly initialized values of type T." This would mean I should use MaybeUninit or raw pointer traversal before I create a slice. The [nomicon](https://doc.rust-lang.org/nomicon/uninitialized.html) states that reading this data as *any* type (which includes u32) is UB. But I how does the Rust compiler know for the purposes of soundness whether the FFI call has properly initialized the region or not? There has to be some maxim like "Rust assume pointers returned by FFI are [or are not] initialized." – Rick Weber Feb 16 '23 at 15:24
  • Unless there is something like LTO, it indeed cannot know and has to assume it was initialized. But I would not suggest you to rely on that: this is still initialized, and undefined behavior is undefined. – Chayim Friedman Feb 16 '23 at 15:25
  • I'm more interested in what the Rust language wants me to do more than what will work :) I don't think it was clear in my post, but this code compiles, runs correctly, and all tests pass. I just want to make sure it does for future compiler revisions and different architectures. – Rick Weber Feb 16 '23 at 15:29
  • @RickWeber The issue is that modern compiler optimizers take into account how the programmer uses the memory semantically. If you use `slice::from_raw_parts_mut`, the compiler is allowed to assume the memory is initialized, and use that information when it optimizes. – Colonel Thirty Two Feb 20 '23 at 04:00

2 Answers2

1

There are multiple parameters to consider when you have an area of memory:

  • The size of it is the most obvious.
  • Its alignment is still somewhat obvious.
  • Whether or not it's initialized -- and notably, for types like bool whether it's initialized with valid values as not all bit-patterns are valid.
  • Whether it's concurrently read/written.

Focusing on the trickier aspects, the recommendation is:

  • If the memory is potentially uninitialized, use MaybeUninit.
  • If the memory is potentially concurrently read/written, use a synchronization method -- be it a Mutex or AtomicXXX or ....

And that's it. Doing so will always be sound, no need to look for "excuses" or "exceptions".

Hence, in your case:

let num_bytes = num_u32 * std::mem::size_of::<u32>();
assert!(num_bytes <= isize::MAX as usize);

let buffer = device.new_buffer(num_bytes, MTLResourceOptions::StorageModeShared);

let data = buffer.contents() as *mut MaybeUninit<u32>;

//  Safety:
//  - `data` is valid for reads and writes.
//  - `data` points to `num_u32` elements.
//  - Access to `data` is exclusive for the duration.
//  - `num_u32 * size_of::<u32>() <= isize::MAX`.
let as_slice = unsafe { slice::from_raw_parts_mut(data, num_u32) };

for i in as_slice {
    i.write(42);  //  Yes you can write `*i = MaybeUninit::new(42);` too,
                  //  but why would you?
}

// OR with nightly:

as_slice.write_slice(some_slice_of_u32s);
Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • From the [nomicon](https://doc.rust-lang.org/nomicon/uninitialized.html), interpreting this data as *any* type (including u32) is UB. It happens to work, but the behavior is still undefined, which is what I'm trying to avoid. From [`raw_with_parts`](https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html#safety), using the function this way in unsafe because the region is uninitialized. So this code is *not* sound. – Rick Weber Feb 16 '23 at 15:36
  • 1
    @RickWeber: `MaybeUninit` is specifically _the_ blessed mechanism to work with uninitialized memory that the nomicon is alluding to -- I wish it named it. With regard to `from_raw_parts`, I think again that it's a documentation issue -- a lack of special-casing for `MaybeUninit`. If you go to the `MaybeUninit` page, you'll see that a lot of the APIs with slices are still unstable. It is still, unfortunately, a work in progress, and I am afraid this is the reason for the lack of proper documentation. Still, per my (substantial) understanding of `unsafe`, this is the soundest you'll get. – Matthieu M. Feb 16 '23 at 16:11
  • 1
    @RickWeber: If you want to stick to the letter of the documentation, you'll need to use `ptr::write` instead. It's doable, certainly, but low-level, and not (in practice) any sounder. – Matthieu M. Feb 16 '23 at 16:14
  • sorry, I glanced at your code and didn't see you were using MaybeUninit. This looks about right like I had in my head for using this technique. – Rick Weber Feb 16 '23 at 16:55
  • @RickWeber: Ah! Indeed without `MaybeUninit` it would be unsound. This [article from Ralf Jung](https://www.ralfj.de/blog/2019/07/14/uninit.html) explains that from an optimizer point of view, it's not just about bit-patterns, and reading uninitialized memory may get you in trouble with the optimizer. – Matthieu M. Feb 16 '23 at 17:11
  • For a follow-up question, if I make an FFI call that fills a region of memory with data, is it sufficient that the FFI call ensures the data in this region is a valid `T` (e.g. alignment, field order, bit patterns etc all produce a valid T)? Or do I need to tell the Rust compiler this region of memory that came from FFI is properly initialized? – Rick Weber Feb 16 '23 at 17:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251931/discussion-between-rick-weber-and-matthieu-m). – Rick Weber Feb 16 '23 at 19:32
0

This is very similar to this post on the users forum mentioned in the comment on your question. (here's some links from that post: 2 3)

The answers there aren't the most organized, but it seems like there's four main issues with uninitialized memory:

  1. Rust assumes it is initialized
  2. Rust assumes the memory is a valid bit pattern for the type
  3. The OS may overwrite it
  4. Security vulnerabilities from reading freed memory

For #1, this seems to me to not be an issue, since if there was another version of the FFI function that returned initialized memory instead of uninitialized memory, it would look identical to rust.

I think most people understand #2, and that's not an issue for u32.

#3 could be a problem, but since this is for a specific OS you may be able to ignore this if MacOS guarantees it does not do this.

#4 may or may not be undefined behavior, but it is highly undesirable. This is why you should treat it as uninitialized even if rust thinks it's a list of valid u32s. You don't want rust to think it's valid. Therefore, you should use MaybeUninit even for u32.

MaybeUninit

It's correct to cast the pointer to a slice of MaybeUninit. Your example isn't written correctly, though. assume_init returns T, and you can't assign that to an element from [MaybeUninit<T>]. Fixed:

let as_slice = unsafe { slice::from_raw_parts_mut(data as *mut MaybeUninit<T>, num_t) };

for i in as_slice {
  i.write(T::new());
}

Then, turning that slice of MaybeUninit into a slice of T:

let init_slice = unsafe { &mut *(as_slice as *mut [MaybeUninit<T>] as *mut [T]) };

Another issue is that &mut may not be correct to have at all here since you say it's shared between GPU and CPU. Rust depends on your rust code being the only thing that can access &mut data, so you need to ensure any &mut are gone while the GPU accesses the memory. If you want to interlace rust access and GPU access, you need to synchronize them somehow, and only store *mut while the GPU has access (or reacquire it from FFI).

Notes

The code is mainly taken from Initializing an array element-by-element in the MaybeUninit doc, plus the very useful Alternatives section from transmute. The conversion from &mut [MaybeUninit<T>] to &mut [T] is how slice_assume_init_mut is written as well. You don't need to transmute like in the other examples since it is behind a pointer. Another similar example is in the nomicon: Unchecked Uninitialized Memory. That one accesses the elements by index, but it seems like doing that, using * on each &mut MaybeUninit<T>, and calling write are all valid. I used write since it's shortest and is easy to understand. The nomicon also says that using ptr methods like write is also valid, which should be equivalent to using MaybeUninit::write.

There's some nightly [MaybeUninit] methods that will be helpful in the future, like slice_assume_init_mut

drewtato
  • 6,783
  • 1
  • 12
  • 17
  • The CPU and GPU aren't concurrently reading or writing to the same address ranges. Here I'm initializing some data and then will launch a kernel on the GPU that will read it. In other cases, I create an uninitialized region of memory, let a GPU shader write to it, then I read from it in Rust. – Rick Weber Feb 16 '23 at 17:05
  • That could be enough to synchronize them, but you still need to ensure the `&mut` does not exist during those times. – drewtato Feb 16 '23 at 18:29
  • It doesn't. This function has returned and the reference is gone by the time we launch the kernel. – Rick Weber Feb 16 '23 at 19:32