0

I've got a file containing raw binary data, that I would like to load to an array of 4 byte long u32 words.

It's possible to do by including the contents of the file at compile time:

let bytes = include_bytes!("audio.raw");

Then, I can convert it to an array of u32 words using transmute. The length of the new array should be obviously 1/4 of the original byte array.

let audio = unsafe { std::mem::transmute::<[u8; 63504], [u32; 15876]>(*bytes) };
// works

As you can see above, I had to hardcode the lengthes of the input and output arrays. However when trying to avoid hardcoding these numbers, it doesn't work:

let audio = unsafe { std::mem::transmute::<[u8; bytes.len()], [u32; bytes.len()/4]>(*bytes) };
// error: non-constant value

It seems that .len() is called at runtime, and since dynamic arrays cannot be allocated in Rust, it yields an error. However in theory, there should be a way to calculate the necessary length at compilation stage, as the length of the bytes array is fixed. So my question is: is there a macro that gives back the length of a static array?

(I'm perfectly aware that dynamic allocation is possible with vectors, my question is explicitly about fixed-sized arrays.)

Sample code (include_bytes replaced with hard-coded array):

fn main() {
    // let bytes = include_bytes!("audio.raw");
    let bytes = [1, 0, 0, 0, 2, 0, 0, 0];

    // works:
    let audio = unsafe { std::mem::transmute::<[u8; 8], [u32; 2]>(bytes) };

    // error:
    let audio = unsafe { std::mem::transmute::<[u8; bytes.len()], [u32; bytes.len() / 4]>(bytes) };

    println!("{}", audio[1]);
}
Stargateur
  • 24,473
  • 8
  • 65
  • 91
balping
  • 7,518
  • 3
  • 21
  • 35
  • Is there any reason you need `audio` to be a fixed-size array rather than a`&[u32]` slice? The latter could be easily built from your byte array without any copying or dynamic allocation. – Sven Marnach Jul 27 '19 at 22:42
  • @SvenMarnach Slices should be fine, but how to do it? – balping Jul 27 '19 at 23:05
  • let slice = &bytes[..]; – Caio Jul 27 '19 at 23:10
  • How does `let slice = &bytes[..];` convert from `[u8]` to `[u32]`? – balping Jul 27 '19 at 23:42
  • By explicit declaring them in the `transmute` parameters, https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=21f624c30d8a46ed289a569b0a21046a – Caio Jul 27 '19 at 23:48
  • This sets the first two elements correctly, but doesn't reduce the size, the rest of the elements are random values from memory that shouldn't be accessed. Try `println!("{:?}", audio);` to see the whole array. – balping Jul 27 '19 at 23:56
  • I think transmute `u8` to `u32` in this context is UB anyway. – Stargateur Jul 28 '19 at 00:43
  • 1
    Possible duplicate of [Temporarily transmute \[u8\] to \[u16\]](https://stackoverflow.com/questions/33968870/temporarily-transmute-u8-to-u16) – Stargateur Jul 28 '19 at 00:46
  • 1
    @balping I was wrong in my first comment – a byte array can't be converted to an `&[u32]` slice, since you don't have any guarantees about the alignment. See my answer for more details. – Sven Marnach Jul 28 '19 at 18:49

2 Answers2

2

array implements Deref<[T]> (slice) and its len method is not constant.

To get a constant value that represents the length of an array, a trait helper may fit your needs.

Stable:

trait ArrayLen {
    const SIZE: usize;
}

macro_rules! impl_array(
    ($($size:expr),+) => {
        $(
            impl<'a, T> ArrayLen for &'a [T; $size] {
                const SIZE: usize = $size;
            }
        )+
    }
);

impl_array!(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 24, 32, 36,
            0x40, 0x80, 0x100, 0x200, 0x400, 0x800, 0x1000, 0x2000, 0x4000, 0x8000,
            0x10000, 0x20000, 0x40000, 0x80000, 0x100000);

fn print<T>(_: T)
where
    T: ArrayLen
{
    println!("{}", T::SIZE);
}

fn main() {
    let bytes = include_bytes!("four_bytes_file.something");
    // The length of `bytes` must match one of the implementations
    print(bytes);
}

Nightly:

#![feature(const_generics)]

trait ArrayLen {
    const SIZE: usize;
}

impl<'a, T, const N: usize> ArrayLen for &'a [T; N] {
    const SIZE: usize = N;
}

fn print<T>(_: T)
where
    T: ArrayLen
{
    println!("{}", T::SIZE);
}

fn main() {
    let bytes = include_bytes!("any_file.something");
    print(bytes);
}

Nevertheless, [T; CONST_PARAMETER / 4] is not possible at the moment but things might change with https://github.com/rust-lang/rust/issues/60471.

Caio
  • 3,178
  • 6
  • 37
  • 52
  • Thanks. I could get your stable example running, I haven't tried the nightly one. The stable example is not very viable though, because it needs a long list of integers that can be divided by 4. It certainly gets problematic with larger files like 63kB. I'll have to stick with hardcoded values for now, it seems. – balping Jul 27 '19 at 23:39
2

In general, it is not possible to transform an array of u8 to an array of u32 without copying.

This transformation would only be valid under the following conditions:

  1. Alignment. The array needs to be aligned to hold u32, which usually means it needs to start at an address that is a multiple of four. For a byte array created using the include_bytes!() macro, there is no guarantee for a suitable alignment. This can be illustrated using this code (playgorund:

    let bytes: [u8; 5] = [1, 2, 3, 4, 5];
    dbg!(bytes.as_ptr().align_offset(std::mem::align_of::<u32>()));
    

    When I tried to run this on the playground, the result was 1, meaning the byte array isn't aligned to hold u32, but there is no guarantee about the result.

  2. Endianness. The bytes in the array must represent 32-bit integers in an endianness that matches the target architecture endianness. If the data you load from the file is in little endian, any code that interprets the data as 32-bit integers without any transformation will only work on little-endian platforms.

  3. Length. The length of the byte array needs to be a multiple of the size of u32, i.e. a multiple of 4.

If these conditions are met, you could theoretically convert the byte array to a slice of u32 using this code:

let audio = unsafe {
    let ptr = bytes.as_ptr() as *const u32;
    let factor = std::mem::size_of::<u32>() / std::mem::size_of::<u8>();
    let len = bytes.len() / factor;
    if ptr.align_offset(std::mem::align_of::<u32>()) == 0
        && bytes.len() % factor == 0
        && cfg!(target_endian = "little")
    {
        std::slice::from_raw_parts(ptr, len)
    } else {
        // Perform proper conversion.
    }
};

However, this seems hardly worth the trouble – you need to implement the real conversion code anyway.

My recommendation is to use an actual audio file format, and load it using a library – this will save you all sorts of trouble.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • Why coding this yourself when there is https://doc.rust-lang.org/src/core/slice/mod.rs.html#2336-2357. – Stargateur Jul 29 '19 at 02:29
  • @Stargateur That function is not useful here. The data is supposed to be converted starting from the first byte. You can't just skip a few bytes if the alignment is wrong -- this would render the audio data completely meaningless. – Sven Marnach Jul 29 '19 at 21:51
  • And ? just throw an error if the the conversion is not complete, your code probably invoke UB, whereas this function is more safe. – Stargateur Jul 29 '19 at 22:03
  • @Stargateur I don't think the code in this answer is undefined behaviour. The main reason I included it is as an illustration why doing this is a bad idea. While you are right that I *could* rewrite it using `align_to()`, I believe it would be less illustrative that way, and it wouldn't even become conceptually simpler, since you would still have to manually check all three conditions I mentioned. – Sven Marnach Jul 29 '19 at 22:25