3

I am having a problem with the Rust bincode library. When it serializes a vector, it always assumes the prefixed length is 8 bytes. This is a fine assumption when you always encode data using bincode because bincode can read it's own serialized data.

I am in the situation where I cannot influence the serializer as I did not write it and it has to stay the same for legacy reasons. It encodes its vectors as a length-prefixed array where the prefix is always 2 bytes (or in some cases it is 4 bytes but but I know these cases well. Once I know how to do it with 2 bytes 4 bytes should not be a problem).

How can I use bincode (and serde for that matter) to deserialize these fields? Can I work around the default 8 bytes of length hardcoded in bincode?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
jonathan
  • 590
  • 3
  • 14
  • 1
    What is this serializer you are trying to decode data from? Bincode is not supposed to be compatible with any existing serializers. – Jan Hudec May 20 '19 at 11:21
  • This data is serialized by a python program and sent over an udp socket. – jonathan May 20 '19 at 11:25
  • But is the serialization in the python program some publicly named format? Or is it just ad-hoc set of `struct.pack`s created for the project? – Jan Hudec May 20 '19 at 11:38
  • The latter, and i didn't create this program. It's part of tribler's ipv8 project https://github.com/Tribler/py-ipv8/blob/master/ipv8/messaging/serialization.py. I just need to interpret the data they send. Bincode seemed like the best way and I implemented 90% of their standard. Its mostly about the variable length things they send as their length prefix is not compatible with bincode. – jonathan May 20 '19 at 11:39

2 Answers2

5

Bincode is not supposed to be compatible with any existing serializer or standard. Nor is, according to the comment, the format you are trying to read.

I suggest you get the bincode sources—they are MIT-licensed, so you are free to do basically whatever you please with them—and modify them to suit your format (and give it your name and include it in your project).

serde::Deserializer is quite well documented, as is the underlying data model, and the implementation in bincode is trivial to find (in de/mod.rs), so take it as your starting point and adjust as needed.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
1

I have figured out a (possibly very ugly) way to do it without implementing my own deserializer — Bincode could do it after all. It looks something like this:

impl<'de> Deserialize<'de> for VarLen16 {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        struct VarLen16Visitor;
        impl<'de> Visitor<'de> for VarLen16Visitor {
            type Value = VarLen16;
            fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                formatter.write_str("VarLen16")
            }

            fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
            where
                A: SeqAccess<'de>,
            {
                let mut res: Vec<u8> = vec![];

                let length: u16 = seq
                    .next_element()?
                    .ok_or_else(|| serde::de::Error::invalid_length(1, &self))?;

                for i in 0..length {
                    res.push(
                        seq.next_element()?
                            .ok_or_else(|| serde::de::Error::invalid_length(1, &self))?,
                    );
                }

                return Ok(VarLen16(res));
            }
        }

        return Ok(deserializer.deserialize_tuple(1 << 16, VarLen16Visitor)?);
    }
}

In short, I make the system think I deserialize a tuple where I set the length to the maximum I need. I have tested this, it does not actually allocate that much memory. Then I act like the length is part of this tuple, read it first and then continue reading as far as this length tells me to. It's not pretty but it certainly works.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
jonathan
  • 590
  • 3
  • 14