17

I'm reading a series of bytes from a socket and I need to put each segment of n bytes as a item in a struct.

use std::mem;

#[derive(Debug)]
struct Things {
    x: u8,
    y: u16,
}

fn main() {
    let array = [22 as u8, 76 as u8, 34 as u8];
    let foobar: Things;
    unsafe {
        foobar = mem::transmute::<[u8; 3], Things>(array);
    }

    println!("{:?}", foobar);

}

I'm getting errors that say that foobar is 32 bits when array is 24 bits. Shouldn't foobar be 24 bits (8 + 16 = 24)?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Fluffy
  • 731
  • 1
  • 8
  • 18

4 Answers4

17

The issue here is that the y field is 16-bit-aligned. So your memory layout is actually

x
padding
y
y

Note that swapping the order of x and y doesn't help, because Rust's memory layout for structs is actually undefined (and thus still 32 bits for no reason but simplicity in the compiler). If you depend on it you will get undefined behavior.

The reasons for alignment are explained in Purpose of memory alignment.

You can prevent alignment from happening by adding the attribute repr(packed) to your struct, but you'll lose performance and the ability to take references of fields:

#[repr(packed)]
struct Things {
    x: u8,
    y: u16,
}

The best way would be to not use transmute at all, but to extract the values manually and hope the optimizer makes it fast:

let foobar = Things {
    x: array[0],
    y: ((array[1] as u16) << 8) | (array[2] as u16),
};

A crate like byteorder may simplify the process of reading different sizes and endianness from the bytes.

Community
  • 1
  • 1
oli_obk
  • 28,729
  • 6
  • 82
  • 98
  • Is there always 8 bits of padding between each item in a struct? – Fluffy Mar 17 '16 at 13:25
  • no, padding is just stuff that's irrelevant. the padding exists just because the other field is aligned – oli_obk Mar 17 '16 at 13:27
  • 1
    @Fluffy Not necessarily. If `x` was also `u16`, no padding would be required. If `x` was still `u8` and you had a `x2` field that's also `u8` between `x` and `y`, you still wouldn't need padding. Do you see why? (Note that you can look up struct alignment regardless of language - the reasons are always the same.) – Theodoros Chatzigiannakis Mar 17 '16 at 13:27
  • @TheodorosChatzigiannakis I think so. It's faster if all the fields are the same size? – Fluffy Mar 17 '16 at 13:29
  • 3
    @Fluffy It's not faster if all fields are the same size. The logic is that `u8` is 8-bit aligned while `u16` is16-bit aligned. This creates a necessary 8-bit space between the two. You can choose to use the space yourself (by extending the first field to 16 bits or by introducing an intermediate 8-bit field) or you can leave it unused (and it's called padding) but either way the compiler will put the 8-bit space there, whether you intend to name it as a field or not. (Unless you explicitly ask for a packed struct, as mentioned in the answer.) – Theodoros Chatzigiannakis Mar 17 '16 at 13:32
  • Look at this answer for a relevant schematic: http://stackoverflow.com/a/381368/1892179 – Theodoros Chatzigiannakis Mar 17 '16 at 13:34
  • @TheodorosChatzigiannakis Sorry if I'm not getting it, but would the way to circumnavigate the issue be making each successive field the same size or larger than the previous one? – Fluffy Mar 17 '16 at 13:34
  • @Fluffy Maybe rearranging the fields could give you a different size in some cases, but probably not in this particular case where you only have an 8-bit field and a 16-bit field (because even if you turn them around, I would expect the compiler to introduce padding at the end to make sure that *arrays* of your struct are aligned). I don't know if there's any algorithm you can do in your mind to pack structs manually without sacrificing speed. It's usually an intuitive process, but there are resources that could help you get a general idea. See http://www.catb.org/esr/structure-packing/ – Theodoros Chatzigiannakis Mar 17 '16 at 13:38
  • @Fluffy: no, as I said, Rust's memory layout is undefined, Rust may at any time change the order and introduce arbitrary padding. It's unlikely it'll do something odd, but it's allowed to. If you use `repr(C)` or `repr(packed)`, you get a somewhat deterministic layout (on a single platform, e.g. windows and linux may differ), but following complex rules – oli_obk Mar 17 '16 at 13:39
  • 4
    @Fluffy even if you somehow work around alignment problems (which is unlikely), you will still have the problem of byte order, which is important if you need your data to be passed across network. Reinterpreting a byte array as a struct is inherently non-portable. That's why there is a lot of serialization formats available. – Vladimir Matveev Mar 17 '16 at 13:40
  • Thanks, you've been really useful. – Fluffy Mar 17 '16 at 13:40
3

bincode and serde can do this quit simply.

use bincode::{deserialize};
use serde::{Deserialize};

#[derive(Deserialize, Debug)]
struct Things {
    x: u8,
    y: u16,
}

fn main() {
    let array = [22 as u8, 76 as u8, 34 as u8];
    let foobar: Things = deserialize(&array).unwrap();
    println!("{:?}", foobar);
}

This also works well for serializing a struct into bytes as well.

use bincode::{serialize};
use serde::{Serialize};

#[derive(Serialize, Debug)]
struct Things {
    x: u8,
    y: u16,
}

fn main() {
    let things = Things{
        x: 22,
        y: 8780,
    };
    let baz = serialize(&things).unwrap();
    println!("{:?}", baz);

}
2

I was having issues using the byteorder crate when dealing with structs that also had char arrays. I couldn't get past the compiler errors. I ended up casting like this:

#[repr(packed)]
struct Things {
    x: u8,
    y: u16,    
}

fn main() {
    let data: [u8; 3] = [0x22, 0x76, 0x34];
    
    unsafe {
        let things_p: *const Things = data.as_ptr() as *const Things;
        let things: &Things = &*things_p;
        
        println!("{:x} {:x}", things.x, things.y);
    }
}

Note that with using packed, you get this warning:

   = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!

If you can, change Things to behave like a C struct:

#[repr(C)]
struct Things2 {
    x: u8,
    y: u16,    
}

Then initialize data like this. Note the extra byte for alignment purposes.

let data: [u8; 4] = [0x22, 0, 0x76, 0x34];
user2233706
  • 6,148
  • 5
  • 44
  • 86
-1
use std::mem;

fn main() {
    let bytes = vec!(0u8, 1u8,2u8, 3, 4, 5, 6, 7, 8, 9, 0xffu8, );


    let data_ptr: *const u64 = unsafe { mem::transmute(bytes[0..4].as_ptr()) };

    let data: u64 = unsafe { *data_ptr };

    println!("{:#x}", data);
}
dpc.pw
  • 3,462
  • 1
  • 19
  • 24
  • Note that this has the same issues as many other answers on SO: It doesn't take into account alignment, potential padding in the target type (not a problem for `u64`), or endianness. – Shepmaster Aug 14 '16 at 13:03
  • There's no need to use transmute on a pointer, just cast the pointer. This will prevent accidental transmuting between pointers and pointers to pointers, as the casts are somewhat typechecked – oli_obk Aug 15 '16 at 08:30
  • As other have said, this is not recommended, but it's exactly what OP asked, and it's technically correct. Endianess and rest of the details to take care of if someone wants to go this route. – dpc.pw Aug 15 '16 at 18:37