I am trying to use serde together with bincode to de-serialize an arbitrary bitcoin network message. Given that the payload is handled ubiquitously as a byte array, how do I de-serialize it when the length is unknown at compile-time? bincode
does by default handle Vec<u8>
by assuming it's length is encoded as u64
right before the elements of the vector. However, this assumption does not hold here because the checksum comes after the length of the payload.
I have the following working solution
Cargo.toml
[package]
name = "serde-test"
version = "0.1.0"
edition = "2018"
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_bytes = "0.11"
bincode = "1.3.3"
main.rs
use bincode::Options;
use serde::{Deserialize, Deserializer, de::{SeqAccess, Visitor}};
#[derive(Debug)]
struct Message {
// https://en.bitcoin.it/wiki/Protocol_documentation#Message_structure
magic: u32,
command: [u8; 12],
length: u32,
checksum: u32,
payload: Vec<u8>,
}
struct MessageVisitor;
impl<'de> Visitor<'de> for MessageVisitor {
type Value = Message;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
formatter.write_str("Message")
}
fn visit_seq<V>(self, mut seq: V) -> Result<Self::Value, V::Error> where V: SeqAccess<'de>,
{
let magic = seq.next_element()?.unwrap();
let command = seq.next_element()?.unwrap();
let length: u32 = seq.next_element()?.unwrap();
let checksum = seq.next_element()?.unwrap();
let payload = (0..length).map(|_| seq.next_element::<u8>().unwrap().unwrap()).collect();
// verify payload checksum (omitted for brevity)
Ok(Message {magic, command, length, checksum, payload})
}
}
impl<'de> Deserialize<'de> for Message {
fn deserialize<D>(deserializer: D) -> Result<Message, D::Error> where D: Deserializer<'de>,
{
deserializer.deserialize_tuple(5000, MessageVisitor) // <-- overallocation
}
}
fn main() {
let bytes = b"\xf9\xbe\xb4\xd9version\x00\x00\x00\x00\x00e\x00\x00\x00_\x1ai\xd2r\x11\x01\x00\x01\x00\x00\x00\x00\x00\x00\x00\xbc\x8f^T\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xc6\x1bd\t \x8d\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xcb\x00q\xc0 \x8d\x12\x805\xcb\xc9yS\xf8\x0f/Satoshi:0.9.3/\xcf\x05\x05\x00\x01";
let msg: Message = bincode::DefaultOptions::new().with_fixint_encoding().deserialize(bytes).unwrap();
println!("{:?}", msg);
}
Output:
Message { magic: 3652501241, command: [118, 101, 114, 115, 105, 111, 110, 0, 0, 0, 0, 0], length: 101, checksum: 3530103391, payload: [114, 17, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 188, 143, 94, 84, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 198, 27, 100, 9, 32, 141, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 203, 0, 113, 192, 32, 141, 18, 128, 53, 203, 201, 121, 83, 248, 15, 47, 83, 97, 116, 111, 115, 104, 105, 58, 48, 46, 57, 46, 51, 47, 207, 5, 5, 0, 1] }
I dislike this solution because of how payload
is handled. It requires me to allocate some "large enough" buffer to take into account the dynamic size of the payload
, In the code snippet above 5000 is sufficient. I would much rather de-serialize payload
as a single element and use deserializer.deserialize_tuple(5, MessageVisitor)
instead.
Is there a way to handle this kind of deserialization in a succint manner?
Similar question I could find: Can I deserialize vectors with variable length prefix with Bincode?