2

In Rust, I want to move all the elements out of a generic fixed-width array so I may then move them individually. The elements may but don't necessarily implement Copy. I've come up with the following solution:

struct Vec3<T> {
    underlying_array: [T; 3]
}

impl<T> Vec3<T> {
    fn into_tuple(self) -> (T, T, T) {
        let result = (
            unsafe { mem::transmute_copy(&self.underlying_array[0]) },
            unsafe { mem::transmute_copy(&self.underlying_array[1]) },
            unsafe { mem::transmute_copy(&self.underlying_array[2]) },
        );
        mem::forget(self);
        result
    }
}

It seems to work, but I want to know if it's safe in the general case. Coming from C++, it's not generally safe to move objects by copying their bit patterns and bypassing the source object's destructor, which I think is essentially what I'm doing here. But in Rust, every type is movable (I think) and there's no way to customize move semantics (I think), so I can't think of any other way Rust would implement moving objects in the un-optimized case than a bitwise copy.

Is moving elements out of an array like this safe? Is there a more idiomatic way to do it? Is the Rust compiler smart enough to elide the actual bit copying transmute_copy does when possible, and if not, is there a faster way to do this?

I think this is not a duplicate of How do I move values out of an array? because in that example, the array elements aren't generic and don't implement Drop. The library in the accepted answer moves individual values out of the array one at a time while keeping the rest of the array usable. Since I want to move all the values at once and don't care about keeping the array, I think this case is different.

jbatez
  • 1,772
  • 14
  • 26
  • Did you have a chance to read the answers to the question, which *do* address the concerns of `Drop`? On Stack Overflow, a duplicate doesn't mean that the question was identical, but that *the same answers apply*. For example, I could copy and paste the accepted answer here and it would seemingly fit perfectly to answer the question in your title. You have a lot of *other* questions in the body that are ancillary to the question in the title, though. – Shepmaster Mar 18 '18 at 15:31
  • It addresses `Drop` by saying the the first part of the answer doesn't handle it and then references a library that does. But I looked at the library and it moves individual values out of the array one at a time while keeping the rest of the array usable. Since I want to move all the values at once and don't care about keeping the array, I think this case is different. – jbatez Mar 18 '18 at 15:43
  • As for the difference between using `replace` + `unitialized` vs `transmute_copy`, I could make that a separate question if you'd like, but I don't think the rest of the questions are ancillary since i'm asking for the minimal overhead way of doing this that's also safe in the general case. – jbatez Mar 18 '18 at 15:48

1 Answers1

3

Rust 1.36 and up

This can now be done in completely safe code:

impl<T> Vec3<T> {
    fn into_tuple(self) -> (T, T, T) {
        let [a, b, c] = self.underlying_array;
        (a, b, c)
    }
}

Previous versions

As you've mentioned and as discussed in How do I move values out of an array?, treating a generic type as a bag of bits can be very dangerous. Once we have copied those bits and they are "live", implementations of traits such as Drop can access the value when we least expect it.

That being said, your current code appears to be safe, but has needless flexibility. Using transmute or transmute_copy is The Big Hammer and is actually rarely needed. You don't want the ability to change the type of the value. Instead, use ptr::read.

It's conventional to expand unsafe blocks to cover the range of code that makes something safe and then to include a comment explaining why the block is actually safe. In this case, I'd expand it to cover the mem::forget; the returned expression has to come along for the ride too.

You will need to ensure that it's impossible for a panic to occur between the time that you've moved the first value out and when you forget the array. Otherwise your array will be half-initialized and you will trigger uninitialized behavior. In this case, I like your structure of writing a single statement that creates the resulting tuple; it's harder to accidentally shoehorn in extra code in there. This is also worth adding comments for.

use std::{mem, ptr};

struct Vec3<T> {
    underlying_array: [T; 3],
}

impl<T> Vec3<T> {
    fn into_tuple(self) -> (T, T, T) {
        // This is not safe because I copied it directly from Stack Overflow
        // without reading the prose associated with it that said I should 
        // write my own rationale for why this is safe.
        unsafe {
            let result = (
                ptr::read(&self.underlying_array[0]),
                ptr::read(&self.underlying_array[1]),
                ptr::read(&self.underlying_array[2]),
            );
            mem::forget(self);
            result
        }
    }
}

fn main() {}

every type is movable

Yes, I believe this to be correct. You can make certain values immovable though (search for "one specific case".

there's no way to customize move semantics

Yes, I believe this to be correct.

Is the Rust compiler smart enough to elide the actual bit copying

In general, I would trust the compiler to do so, yes. However, optimization is a tricky thing. Ultimately, only looking at assembly and profiling in your real case will tell you the truth.

is there a faster way to do this?

Not that I'm aware of.


I'd normally write such code as:

extern crate arrayvec;
extern crate itertools;

use arrayvec::ArrayVec;
use itertools::Itertools;

struct Vec3<T> {
    underlying_array: [T; 3],
}

impl<T> Vec3<T> {
    fn into_tuple(self) -> (T, T, T) {
        ArrayVec::from(self.underlying_array).into_iter().next_tuple().unwrap()
    }
}

Investigating the assembly for both implementations, the first one takes 25 x86_64 instructions and the second takes 69. Again, I'd rely on profiling to know which was actually faster, since more instructions doesn't necessarily mean slower.

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    I'd also note that the documentation for `ptr::read` states that it "semantically moves the value out of `src`..." which helps with my concern that a bitwise copy might not always be equivalent to a move. – jbatez Mar 18 '18 at 16:26
  • In this case it's quite clear that panics won't happen in the critical parts of the code. In cases where I'm less certain, I prefer using `std::mem::ManuallyDrop` over `mem::forget`, since it means that a panic will only leak memory and not cause undefined behaviour. – CodesInChaos Mar 19 '18 at 11:15
  • @Shepmaster I'm not sure if I understood correctly. In you first code snippet is the mem::forget(self) always necessary in this situation? – Ph. Weier May 10 '20 at 11:23
  • @Ph.Weier make sure you see my update on the better way to write this now. Yes, the `mem::forget` is necessary, otherwise you can end up with two owners of the data inside the array. If `T` is a `String`, that would lead to an attempt to free the string multiple times, causing memory unsafety. – Shepmaster May 11 '20 at 15:05
  • Thanks for the update! Oh that looks very nice with the new version. What if we need to move an arbitrarly sized array, is then the `ptr::read` way still the way to go? I heard of `std::array::IntoIter::new(array)` to get a consuming iterator but this appears to still be a non stable feature. – Ph. Weier May 12 '20 at 00:52
  • @Ph.Weier you [can't have an "arbitrarily sized array"](https://stackoverflow.com/q/28136739/155423), so I don't know what you mean. – Shepmaster May 12 '20 at 01:43
  • @Shepmaster Yeah that was not a very precise question indead. I'm refering to the GenericArray from the crate [Inline Link](https://docs.rs/generic-array/0.14.1/generic_array/). In which case I cannot work use something like `let [a,b,c,d] = generic_array` as the size will only be known at compilation. To circumvent this I'm using the `std::ptr::read(&generic_array[i])` approach to get ownership. Note that I always go through the whole array to construct a new one. Maybe this has not it's place in comments and I should just open a new thread for my specific issue. – Ph. Weier May 12 '20 at 03:40
  • 1
    @Ph.Weier that crate appears to have a [by-value iterator](https://docs.rs/generic-array/0.14.1/generic_array/iter/struct.GenericArrayIter.html#impl-Iterator). Applied to an [example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a37934f9e7759fbd5574908c053857ce). – Shepmaster May 12 '20 at 13:09
  • Thank you very much that was it! I somehow tried that at some point but had a problem using a Deref trait implementation on the newtype on the GenericArray. After accessing the inner data directly without using any deref it works as expected :) – Ph. Weier May 12 '20 at 21:59