The most idiomatic way to efficiently serialize/deserialize a Copy struct into/out of [u8]

Question

Copy means that the struct could be copied just by copying bytes as is. As a result, it should be easily possible to re-interpret such a struct as [u8]. What's the most idiomatic way to do so, preferably without involving unsafe.

I want to have an optimized struct which could be easily sent via processes/wire/disk. I understand, that there're a lot of details which needs to be taken care of, like alignment, and looking for a solution for such a high performance use case. I.e. I am looking for close to zero copy high performance serialization.

Does this answer your question? [How to convert 'struct' to '&\[u8\]'?](https://stackoverflow.com/questions/28127165/how-to-convert-struct-to-u8) — Michael Zajac, Jul 05 '22 at 22:38
@MichaelZajac It's pretty similar, but not exactly what I want. I will expand the description to describe the use case in more detail. — Konstantin Solomatov, Jul 05 '22 at 22:39

score 3 · Accepted Answer · answered Jul 06 '22 at 01:01

3

Copy means that the struct could be copied just by copying bytes as is.

This is true.

As a result, it should be easily possible to re-interpret such a struct as [u8].

This is not true, because Copy structs can still contain padding, which is not permitted to be read except incidentally while copying.

What's the most idiomatic way to do so, preferably without involving unsafe.

You should start with bytemuck. It is a library which provides trivial conversion to and from [u8] when it is safe to do so. In particular, it checks that there is no padding in the struct, and that the representation is well-defined (not subject to the whims of the compiler).

You will still need to consider alignment, and for that purpose may need to introduce explicit “padding” fields (whose value is explicitly set rather than being left undefined) so that the alignment of other fields is satisfied.

Your program's data will also not be compatible with machines of different endianness unless you take care. (However, it is possible to do so, in ways which have zero run-time overhead if not necessary, and most machines are little-endian today so that cost will almost never actually apply.)

answered Jul 06 '22 at 01:01

Kevin Reid

37,492
13
80
108

And why reading padding is bad? Is it UB for some fundamental reason, or just for convenience of compiler writers? – Konstantin Solomatov Jul 06 '22 at 01:13
@KonstantinSolomatov It is UB because allowing to conside padding as uninitalized bytes have perf advantages, and allowing to read uninit bytes is... problematic (the read itself is not (it can just produce an uninit byte), but any use of it is. Rust prefers the read to be UB for reasons, for example easier to check). – Chayim Friedman Jul 06 '22 at 09:18
@ChayimFriedman BTW, do you have any link that mentions that reading them is UB? I found only this page: https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html It says this which is contradiction with the above: > In other words, the only cases in which reading uninitialized memory is permitted are inside unions and in "padding" (the gaps between the fields/elements of a type). – Konstantin Solomatov Jul 06 '22 at 13:38
1

@KonstantinSolomatov You're misreading this. The meaning is that it is valid to read uninit bytes _into_ padding bytes, not _from_ them. The words where it disallows that are "Uninitialized memory is also implicitly invalid for any type that has a restricted set of valid values", or more explicitly above "Producing an invalid value... An integer ... obtained from uninitialized memory.". – Chayim Friedman Jul 07 '22 at 08:00

The most idiomatic way to efficiently serialize/deserialize a Copy struct into/out of [u8]

1 Answers1