1

I am trying to create a generic serialization scheme for structs. Following an answer given on a similar thread, I have the following setup:

#[repr(packed)]
struct MyStruct {
    bytes: [u8; 4]
}

unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
    ::std::slice::from_raw_parts(
        (p as *const T) as *const u8,
        ::std::mem::size_of::<T>(),
    )
}

fn main() {
    let s = MyStruct { bytes: [0u8, 1u8, 2u8, 3u8].to_owned() };
    
    let bytes: &[u8] = unsafe { any_as_u8_slice(&s) };
    
    println!("{:?}", bytes);
}

(playground)

Output:

[0, 1, 2, 3]

This works great, however it does not take into account struct members that are dynamically sized like Vec<u8> and their size needs to determined at runtime. Ideally, I would like to encode each element in the Vec<u8> as bytes and add a prefix for indicating how many bytes to read.

At the moment I have this:

#[repr(packed)]
struct MyStruct {
    bytes: Vec<u8>
}

unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
    ::std::slice::from_raw_parts(
        (p as *const T) as *const u8,
        ::std::mem::size_of::<T>(),
    )
}

fn main() {
    let s = MyStruct { bytes: [0u8, 1u8, 2u8, 3u8].to_vec() };
    
    let bytes: &[u8] = unsafe { any_as_u8_slice(&s) };
    
    println!("{:?}", bytes);
}

(playground)

Output:

[208, 25, 156, 239, 136, 85, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0]

I assume the output above is referencing a pointer of sorts, but I am not sure.

Currently, the bincode crate does this together with the serde crate, but it serializes the length of the vector as usize. I would rather want to specify this and encode the length as u8, as explained in this thread. Unfortunately, the best solution here is to rewrite the Bincode library, which made me look for any alternative solution.

EDIT

Implementation using serde and bincode:

use serde::{Serialize};

#[derive(Clone, Debug, Serialize)]
struct MyStruct {
    bytes: Vec<u8>
}

fn main() {
    let s = MyStruct { bytes: [0u8, 1u8, 2u8, 3u8].to_vec() };
    
    let bytes = bincode::serialize(&s).unwrap();
    
    println!("{:?}", bytes);
}

Output:

[4, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3]

Wanted output:

[4, 0, 1, 2, 3]
Kevin
  • 3,096
  • 2
  • 8
  • 37
  • @kmdreko That would be great, but I could not figure how to change the options. I edited the post, can you show where I can change the configuration? – Kevin Sep 03 '21 at 17:39

2 Answers2

3

The output you see for the Vec is exactly as expected. A Vec has three elements, a pointer, the length, and its capacity. This is guaranteed by the standard library. In your case you have the pointer and both length and capacity are the number 4 in little endian.

It is impossible to turn a struct containing a Vec into &[u8] the way you want. A &[u8] slice is a contiguous piece of memory, however, a Vec is fundamentally an indirection, meaning its elements are not stored contiguous to the rest of the struct.
At least, you need to collect the bytes into a Vec<u8> or similar, since you need to copy data from multiple places.

jonasbb
  • 2,131
  • 1
  • 6
  • 25
3

If your only problem with bincode is the usize length prefix, you can configure it to use a variable-length prefix by using the with_varint_encoding option.

use bincode::{DefaultOptions, Options};
use serde::Serialize;

#[derive(Clone, Debug, Serialize)]
struct MyStruct {
    bytes: Vec<u8>,
}

fn main() {
    let s = MyStruct {
        bytes: [0u8, 1u8, 2u8, 3u8].to_vec(),
    };

    let bytes = DefaultOptions::new()
        .with_varint_encoding()
        .serialize(&s);

    println!("{:?}", bytes);
}

Output:

[4, 0, 1, 2, 3]
kmdreko
  • 42,554
  • 6
  • 57
  • 106
  • 1
    Would it be possible to set the ```with_varint_encoding``` option only for certain members in the struct? Changing this configuration seems to change all the serialization for all unsigned integers expect ```u8```. – Kevin Sep 05 '21 at 15:57
  • @Kevin I do not know how that could be done, sorry. – kmdreko Sep 05 '21 at 17:17
  • @Kevin, did you figure out a way to change the int encoding for certain members? – keplerian Aug 02 '22 at 03:03
  • 1
    @Keplerian No, I think I ended up creating two different structs with two different options for each case. – Kevin Aug 02 '22 at 11:28