14

I'm trying to convert a Vec of u32s to a Vec of u8s, preferably in-place and without too much overhead.

My current solution relies on unsafe code to re-construct the Vec. Is there a better way to do this, and what are the risks associated with my solution?

use std::mem;
use std::vec::Vec;

fn main() {
    let mut vec32 = vec![1u32, 2];
    let vec8;
    unsafe {
        let length = vec32.len() * 4; // size of u8 = 4 * size of u32
        let capacity = vec32.capacity() * 4; // ^
        let mutptr = vec32.as_mut_ptr() as *mut u8;
        mem::forget(vec32); // don't run the destructor for vec32

        // construct new vec
        vec8 = Vec::from_raw_parts(mutptr, length, capacity);
    }

    println!("{:?}", vec8)
}

Rust Playground link

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Thom Wiggers
  • 6,938
  • 1
  • 39
  • 65
  • 3
    This is platform dependent, since you don't take the [byteorder](https://en.wikipedia.org/wiki/Endianness) into account. – Tim Diekmann Apr 06 '18 at 10:24
  • 6
    https://docs.rs/byteorder/1.2.2/byteorder/trait.ByteOrder.html#tymethod.from_slice_u32 can be used to account for byte order, without copying and is a no-op when the source byte order is the same as the host byte order. Otherwise, I believe your code is safe. Note that the reverse isn't necessarily safe! (Because of alignment.) – BurntSushi5 Apr 06 '18 at 10:44
  • 1
    Related: https://codereview.stackexchange.com/questions/187013/base64-string-↔-float-array/187077 – Boiethios Apr 06 '18 at 11:26

5 Answers5

20
  1. Whenever writing an unsafe block, I strongly encourage people to include a comment on the block explaining why you think the code is actually safe. That type of information is useful for the people who read the code in the future.

  2. Instead of adding comments about the "magic number" 4, just use mem::size_of::<u32>. I'd even go so far as to use size_of for u8 and perform the division for maximum clarity.

  3. You can return the newly-created Vec from the unsafe block.

  4. As mentioned in the comments, "dumping" a block of data like this makes the data format platform dependent; you will get different answers on little endian and big endian systems. This can lead to massive debugging headaches in the future. File formats either encode the platform endianness into the file (making the reader's job harder) or only write a specific endinanness to the file (making the writer's job harder).

  5. I'd probably move the whole unsafe block to a function and give it a name, just for organization purposes.

  6. You don't need to import Vec, it's in the prelude.

use std::mem;

fn main() {
    let mut vec32 = vec![1u32, 2];

    // I copy-pasted this code from StackOverflow without reading the answer 
    // surrounding it that told me to write a comment explaining why this code 
    // is actually safe for my own use case.
    let vec8 = unsafe {
        let ratio = mem::size_of::<u32>() / mem::size_of::<u8>();

        let length = vec32.len() * ratio;
        let capacity = vec32.capacity() * ratio;
        let ptr = vec32.as_mut_ptr() as *mut u8;

        // Don't run the destructor for vec32
        mem::forget(vec32);

        // Construct new Vec
        Vec::from_raw_parts(ptr, length, capacity)
    };

    println!("{:?}", vec8)
}

Playground

My biggest unknown worry about this code lies in the alignment of the memory associated with the Vec.

Rust's underlying allocator allocates and deallocates memory with a specific Layout. Layout contains such information as the size and alignment of the pointer.

I'd assume that this code needs the Layout to match between paired calls to alloc and dealloc. If that's the case, dropping the Vec<u8> constructed from a Vec<u32> might tell the allocator the wrong alignment since that information is based on the element type.

Without better knowledge, the "best" thing to do would be to leave the Vec<u32> as-is and simply get a &[u8] to it. The slice has no interaction with the allocator, avoiding this problem.

Even without interacting with the allocator, you need to be careful about alignment!

See also:

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    And I'm not sure that `size_of::() / size_of::()` is much better than `4`. You have to know the sizes of those types in order to know there won't be any rounding errors. It would be better to multiply by `size_of::() ` and then divide by `size_of::()`, which you'd have to do if you wrote this generically. – Peter Hall Apr 06 '18 at 14:30
  • 1
    @PeterHall You also have to know the sizes of the types to know that `4` is the correct result, this just moves comments to code. If constant evaluation was further along, I'd move the `let` to a `const` and add a [`compile_error!`](https://doc.rust-lang.org/std/macro.compile_error.html) that asserts the value is `4`. I'd strongly advocate for *not* writing this generically, so I wasn't as concerned about that case ;-) – Shepmaster Apr 06 '18 at 14:33
  • I think you can write this generically, as long as you assume the output is &[u8], in which case you can dispense with the division, and just multiply by the `size_of` the source type. – Peter Hall Apr 06 '18 at 14:36
  • @PeterHall well, depends on how generic you mean and what you expect the behavior to be. For example, if you allow an input `Vec`, you allow a `Vec<&[(String, BTreeMap, PhantomData)]>` as well, which I wouldn't think would be a great idea. – Shepmaster Apr 06 '18 at 14:40
  • @PeterHall if you want a more principled generic, something like the [safe-transmute crate](https://crates.io/crates/safe-transmute) might be more applicable. – Shepmaster Apr 06 '18 at 14:44
  • What's wrong with that? You know.. if you wanted to turn a bunch of pointers to bytes... :P But you can obviously protect from that by using marker traits to opt-in useful types. It would look something like: https://play.rust-lang.org/?gist=736bf9be1c798d568f22471be7f677ff&version=stable – Peter Hall Apr 06 '18 at 14:45
  • 1
    I didn't know about that crate! I'm sure there didn't used to be so many crates... ;) – Peter Hall Apr 06 '18 at 14:46
  • 2
    The issue with `Layout` is not something I'd have suspected. In low-level code I'm used to worrying about casting to a pointer with *greater* alignment, but casting to a pointer with *lower* alignment has always been safe. I wonder if it should be specified that allocators should handle this case graciously. – Matthieu M. Apr 06 '18 at 16:51
6

If in-place convert is not so mandatory, something like this manages bytes order control and avoids the unsafe block:

extern crate byteorder;

use byteorder::{WriteBytesExt, BigEndian};

fn main() {
    let vec32: Vec<u32> = vec![0xaabbccdd, 2];
    let mut vec8: Vec<u8> = vec![];

    for elem in vec32 {
        vec8.write_u32::<BigEndian>(elem).unwrap();
    }

    println!("{:?}", vec8);
}
attdona
  • 17,196
  • 7
  • 49
  • 60
1

Simply transmuting the Vec or using from_raw_parts is undefined behavior because the deallocation API requires that the passed Layout is the same as the allocation was allocated with. To do this kind of conversion soundly you need to go through the Vec's associated allocator and call shrink to convert the layout to the new alignment before calling from_raw_parts. This depends on the allocator being able to perform in-place reallocation.

If you don't need the resulting vector to be resizable then reinterpreting a &mut [u32] borrow of the vec to &mut [u8] would be a simpler option.

the8472
  • 40,999
  • 5
  • 70
  • 122
0

This is how I solved the problem using a bitshifting copy.

It works on my x64 machine, but I am unsure if I have made unsafe assumptions about little/big endianism.

Runtime performance would be faster if this cast could be done memory inplace without the need for a copy, but I have not figured out how to do this yet.

/// Cast Vec<u32> to Vec<u8> without modifying underlying byte data
/// ```
/// # use fractals::services::vectors::vec_u32_to_u8;
/// assert_eq!( vec_u32_to_u8(&vec![ 0x12345678 ]), vec![ 0x12u8, 0x34u8, 0x56u8, 0x78u8 ]);
/// ```
#[allow(clippy::identity_op)]
pub fn vec_u32_to_u8(data: &Vec<u32>) -> Vec<u8> {
    // TODO: https://stackoverflow.com/questions/72631065/how-to-convert-a-u32-array-to-a-u8-array-in-place
    // TODO: https://stackoverflow.com/questions/29037033/how-to-slice-a-large-veci32-as-u8
    let capacity = 32/8 * data.len() as usize;  // 32/8 == 4
    let mut output = Vec::<u8>::with_capacity(capacity);
    for &value in data {
        output.push((value >> 24) as u8);  // r
        output.push((value >> 16) as u8);  // g
        output.push((value >>  8) as u8);  // b
        output.push((value >>  0) as u8);  // a
    }
    output
}
James McGuigan
  • 7,542
  • 4
  • 26
  • 29
-1

Try this

let vec32: Vec<u32> = vec![1u32, 2u32];
let mut vec8: Vec<u8> = vec![];
for v in &vec32{
    for b in v.to_be_bytes(){
        vec8.push(b);
    }
}
println!("{:?}", vec32);
println!("{:?}", vec8);

Playgroud

surinder singh
  • 1,463
  • 13
  • 12