How are [u8; N]
and [u8]
serialized?
To cut strait to the point, here is how serde
1.0.151 implements each method. serialize_bytes
is not actually part of serde
so it gets treated as a sequence.
// [T; N] is serialized as a tuple. However, this is only implemented for N 0 to 32 inclusively.
let mut seq = try!(serializer.serialize_tuple(N));
for e in self {
try!(seq.serialize_element(e));
}
seq.end()
// [T] is serialized as a sequence.
serializer.collect_seq(self)
The methods serialize_tuple
and collect_seq
are implemented by the specific serializer you are using.
The easy way
One common problem is that serde
only implements Serialize
/Deserialize
for arrays up to length 32. The easiest approach is to use a crate like serde_with
which adds extra serialize/deserialize implementations you can attach to your structs. Here is an example taken from their documentation:
#[serde_as]
#[derive(Deserialize, Serialize)]
struct Arrays<const N: usize, const M: usize> {
#[serde_as(as = "[_; N]")]
constgeneric: [bool; N],
#[serde_as(as = "Box<[[_; 64]; N]>")]
nested: Box<[[u8; 64]; N]>,
#[serde_as(as = "Option<[_; M]>")]
optional: Option<[u8; M]>,
}
How can we implement it ourselves? Rust Playground
Serialize
Performing serialization is actually quite easy. Serde does not have a concept of arrays, so we need to choose between serialize_tuple
or serialize_seq
. Under the hood, the only difference is serialize_seq
may not have a known length so we can choose serialize_tuple
.
pub fn serialize<S, T, const N: usize>(this: &[T; N], serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
T: Serialize,
{
let mut seq = serializer.serialize_tuple(N)?;
for element in this {
seq.serialize_element(element)?;
}
seq.end()
}
Deserialize
On the other hand, deserialize gets a bit more complicated. We need to define a visitor that then specifies how each element should be visited. I wrote out a single example of how it could be done in the general case of an array, but this is not the most optimal solution since it first deserializes onto the stack. I also had to make use of unsafe
code to only initialize the array one element at a time, but that unsafe
code can easily be removed if T: Default
or if an expanding data structure like a Vec<T>
is used instead. Generally, this is more intended to be a guide for implementing deserialize on a sequence.
pub fn deserialize<'de, D, T, const N: usize>(deserializer: D) -> Result<[T; N], D::Error>
where
D: Deserializer<'de>,
T: 'de + Deserialize<'de>,
{
deserializer.deserialize_seq(ArrayVisitor { _phantom: PhantomData })
}
struct ArrayVisitor<'de, T, const N: usize> {
_phantom: PhantomData<&'de [T; N]>,
}
impl<'de, T, const N: usize> Visitor<'de> for ArrayVisitor<'de, T, N>
where
T: Deserialize<'de>,
{
type Value = [T; N];
fn expecting(&self, f: &mut fmt::Formatter) -> fmt::Result {
write!(f, "array of length {}", N)
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: SeqAccess<'de>,
{
let mut array: MaybeUninit<[T; N]> = MaybeUninit::uninit();
for index in 0..N {
// Get next item as Result<Option<T>, A::Error>. Since we know
// exactly how many elements we should receive, we can flatten
// this to a Result<T, A::Error>.
let next = seq.next_element::<T>()
.and_then(|x| x.ok_or_else(|| Error::invalid_length(N, &self)));
match next {
Ok(x) => unsafe {
// Safety: We write into the array without reading any
// uninitialized memory and writes only occur within the
// array bounds at multiples of the array stride.
let array_base_ptr = array.as_mut_ptr() as *mut T;
ptr::write(array_base_ptr.add(index), x);
},
Err(err) => {
// Safety: We need to manually drop the parts we
// initialized before we can return.
unsafe {
let array_base_ptr = array.as_mut_ptr() as *mut T;
for offset in 0..index {
ptr::drop_in_place(array_base_ptr.add(offset));
}
}
return Err(err)
},
}
}
// Safety: We have completely initialized every element
unsafe { Ok(array.assume_init()) }
}
}
If anyone is curious how derive(Deserialize)
works on structs, I would recommend looking at this Rust Playground where I expanded macros and then cleaned up the output to be more human readable. Seeing how serialize/deserialize works can really help to demystify the process.