19

Is there a better way to cast Vec<i8> to Vec<u8> in Rust except for these two?

  1. creating a copy by mapping and casting every entry
  2. using std::transmute

The (1) is slow, the (2) is "transmute should be the absolute last resort" according to the docs.

A bit of background maybe: I'm getting a Vec<i8> from the unsafe gl::GetShaderInfoLog() call and want to create a string from this vector of chars by using String::from_utf8().

Alexander
  • 1,299
  • 2
  • 12
  • 32
  • This seems relevant: https://github.com/nukep/rust-opengl-util/blob/fc30c6e386b0a4510564f242d995c845472207d3/shader.rs#L56 – sshashank124 Jan 12 '20 at 19:51
  • Is this code you'll call that often it is worth messing around with such unsafe mechanisms like transmute and from_raw_parts? Usually you don't (re)compile shaders all that often... – KillianDS Jan 13 '20 at 08:25

3 Answers3

19

The other answers provide excellent solutions for the underlying problem of creating a string from Vec<i8>. To answer the question as posed, creating a Vec<u8> from data in a Vec<i8> can be done without copying or transmuting the vector. As pointed out by @trentcl, transmuting the vector directly constitutes undefined behavior because Vec is allowed to have different layout for different types.

The correct (though still requiring the use of unsafe) way to transfer a vector's data without copying it is:

  • obtain the *mut i8 pointer to the data in the vector, along with its length and capacity
  • leak the original vector to prevent it from freeing the data
  • use Vec::from_raw_parts to build a new vector, giving it the pointer cast to *mut u8 - this is the unsafe part, because we are vouching that the pointer contains valid and initialized data, and that it is not in use by other objects, and so on.

This is not UB because the new Vec is given the pointer of the correct type from the start. Code (playground):

fn vec_i8_into_u8(v: Vec<i8>) -> Vec<u8> {
    // ideally we'd use Vec::into_raw_parts, but it's unstable,
    // so we have to do it manually:

    // first, make sure v's destructor doesn't free the data
    // it thinks it owns when it goes out of scope
    let mut v = std::mem::ManuallyDrop::new(v);

    // then, pick apart the existing Vec
    let p = v.as_mut_ptr();
    let len = v.len();
    let cap = v.capacity();
    
    // finally, adopt the data into a new Vec
    unsafe { Vec::from_raw_parts(p as *mut u8, len, cap) }
}

fn main() {
    let v = vec![-1i8, 2, 3];
    assert!(vec_i8_into_u8(v) == vec![255u8, 2, 3]);
}
user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • Good answer, "slight" correction: pointer casts (+ `from_raw_parts`) are not just "slightly safer" than `transmute`, because (AIUI) `transmute` from `Vec` to `Vec` is always undefined behavior. Both use `unsafe`, but `from_raw_parts` is *correct*, while `transmute` is *incorrect*. – trent Jan 13 '20 at 14:50
  • @trentcl Thanks, I've updated the answer. (Here I meant to compare the cast to transmuting the _pointer_, an idea which I had in an early draft of the answer, and removed before posting it.) However, I'm curious exactly _why_ would transmute from `Vec` to `Vec` be UB, while creating a new `Vec` from the pointer obtained with a cast from `*mut i8` to `*mut u8` is kosher? Is it because the representation of `Vec` might depend on the type of `T` in the future, or is there something makes it UB wrt current Rust's or LLVM's abstract machine models? – user4815162342 Jan 13 '20 at 17:50
  • 2
    Just because the representation of `Vec` may depend on `T`. It's not likely, but it is allowed. I confess I find it hard to imagine a situation where this would be a useful thing to exploit. – trent Jan 13 '20 at 19:20
  • 1
    @trentcl Agreed, `Vec` is probably not the best example, but I can imagine a data structure storing information about a generic type, for example to prevent code explosion from monomorphization. Still, I wonder how far one could take the caution. For example, is it UB to pass `Vec::from_raw_parts` a `*mut u8` pointer whose data was originally allocated through a reques for `*mut i8`? It would help if there were clearer documentation as to what constitutes UB in unsafe Rust. – user4815162342 Jan 13 '20 at 20:15
11

transmute on a Vec is always, 100% wrong, causing undefined behavior, because the layout of Vec is not specified. However, as the page you linked also mentions, you can use raw pointers and Vec::from_raw_parts to perform this correctly. user4815162342's answer shows how.

(std::mem::transmute is the only item in the Rust standard library whose documentation consists mostly of suggestions for how not to use it. Take that how you will.)

However, in this case, from_raw_parts is also unnecessary. The best way to deal with C strings in Rust is with the wrappers in std::ffi, CStr and CString. There may be better ways to work this in to your real code, but here's one way you could use CStr to borrow a Vec<c_char> as a &str:

const BUF_SIZE: usize = 1000;
let mut info_log: Vec<c_char> = vec![0; BUF_SIZE];
let mut len: usize;
unsafe {
    gl::GetShaderInfoLog(shader, BUF_SIZE, &mut len, info_log.as_mut_ptr());
}
let log = Cstr::from_bytes_with_nul(info_log[..len + 1])
    .expect("Slice must be nul terminated and contain no nul bytes")
    .to_str()
    .expect("Slice must be valid UTF-8 text");

Notice there is no unsafe code except to call the FFI function; you could also use with_capacity + set_len (as in wasmup's answer) to skip initializing the Vec to 1000 zeros, and use from_bytes_with_nul_unchecked to skip checking the validity of the returned string.

trent
  • 25,033
  • 7
  • 51
  • 90
2
  1. See this:
fn get_compilation_log(&self) -> String {
    let mut len = 0;
    unsafe { gl::GetShaderiv(self.id, gl::INFO_LOG_LENGTH, &mut len) };
    assert!(len > 0);

    let mut buf = Vec::with_capacity(len as usize);
    let buf_ptr = buf.as_mut_ptr() as *mut gl::types::GLchar;
    unsafe {
        gl::GetShaderInfoLog(self.id, len, std::ptr::null_mut(), buf_ptr);
        buf.set_len(len as usize);
    };

    match String::from_utf8(buf) {
        Ok(log) => log,
        Err(vec) => panic!("Could not convert compilation log from buffer: {}", vec),
    }
}

  1. See ffi:
let s = CStr::from_ptr(strz_ptr).to_str().unwrap();

Doc

Ömer Erden
  • 7,680
  • 5
  • 36
  • 45
wasmup
  • 14,541
  • 6
  • 42
  • 58