ProtoBuf-Net error message " Invalid field in source data: 0"

Question

I succeed in serializing instances of the following class but when I try to deserialize right after I get the following error message: " Invalid field in source data: 0".

I have no clue what it refers to because I find below class straight forward. I just updated protobuf-net version to 2.00.614 (runtime version: 2.0.50727).

Any idea whether I am possibly overlooking something trivial?

[ProtoContract]
public class TimeSeriesProperties 
{
    [ProtoMember(1)]
    public string TimeSeriesName { get; private set; }
    [ProtoMember(2)]
    public string FileName { get; private set; }
    [ProtoMember(3)]
    public string TemplateName { get; private set; }
    [ProtoMember(4)]
    public int PacketLength { get; private set; }
    [ProtoMember(5)]
    public long FileSizeBytes { get; set; }
    [ProtoMember(6)]
    public long NumberRecords { get; set; }
    [ProtoMember(7)]
    public DateTime DateTimeStart { get; set; }
    [ProtoMember(8)]
    public DateTime DateTimeEnd { get; set; }

    public TimeSeriesProperties()
    {

    }

    public TimeSeriesProperties(string timeSeriesName, string fileName, string templateName, int PacketLength)
    {
        this.TimeSeriesName = timeSeriesName;
        this.FileName = fileName;
        this.TemplateName = templateName;
        this.PacketLength = PacketLength;
    }

}

public static byte[] Serialize_ProtoBuf<T>(T serializeThis)
    {
        using (var stream = new MemoryStream())
        {
            ProtoBuf.Serializer.Serialize<T>(stream, serializeThis);
            return stream.GetBuffer();
        }
    }

    public static T Deserialize_ProtoBuf<T>(byte[] byteArray)
    {
        using (var stream = new MemoryStream(byteArray))
        {
            return ProtoBuf.Serializer.Deserialize<T>(stream);
        }
    }

That almost always means you are over-reading the written data, typically by incorrect handling of `MemoryStream` - nothing to do with protobuf-net; please can you show the code that does the serialize/deserialize test? (and ideally, add a @marc comment, so I know to come back and look at it) — Marc Gravell, Jan 07 '13 at 13:29
@MarcGravell, by the way, I noticed that MemoryStream by default allocates 256 bytes, while the serialized object only required 146 bytes. When I deserialize, I call the method with byte array of 10000 bytes, where obviously only the first 146 bytes are non-zero. I have to do that because I do not know the exact size of the serialized object and I think this actually worked before (so I am very sure that mere fact is not the cause of the problem), that is why I am very confused what else may be wrong. — Matt, Jan 07 '13 at 13:40
that approach would never have worked with **any** protobuf implementation; it is a feature of the specification that the outermost message *does not know its own length* - this is so that fragments are mergeable via concatenation. Thus, by default it reads to the end of the stream (although most implementations include a mechanism for reading `n` bytes, for some `n`). At the end of each field, it expects **either** another field-header, or an EOF. 0 is **never** a valid field-header, so trailing zeros will break the deserializer. Every time. — Marc Gravell, Jan 07 '13 at 13:44
incidentally, the amount of data written is simply: `stream.Length` — Marc Gravell, Jan 07 '13 at 13:48
Mark Gravell, yes you are correct, it works when I deserialize the exact same byte array produced when I serialized the object. However, my problem is that I do not know the size of byte array when I read it later on. The byte array represents a header of a binary file and the only solution I see would be to store an int right at the beginning of the binary to indicate how large the header is gonna be. Any better ideas? — Matt, Jan 07 '13 at 13:51
that depends entirely on the context; in the scenario you show (with most `Stream`s etc), the length is trivially available, and the entire thing is fixed just via `ToArray()`, or a constrained read. With some implementations (such as a `NetworkStream` sending multiple messages), `[Serialize|Deserialize]WithLengthPrefix` is your friend. I'm not sure why you don't know the length after serializing. Can you expand on that? If this is for a header at the start of a file, the `*WithLengthPrefix` should work fine. — Marc Gravell, Jan 07 '13 at 13:55
@MarcGravell, maybe I miscommunicated. I do not have the length available at the time of deserialization. All I have is a binary file which is supposed to start with a serialized header whose length I do not know at that time unless I store it somewhere. I was under the impression ProtoBuf-Net can detect (through some sort of "magic byte" ) where the object ends even when there are trailing zeros but apparently I was wrong. So I still stand with the problem of having to figure out how many bytes to deserialize to get back my header object. — Matt, Jan 07 '13 at 13:58
just use the `*WithLengthPrefix` methods. It will do everything you want, without over-reading (so you can just continue reading from the stream afterwards) — Marc Gravell, Jan 07 '13 at 13:59
@MarcGravell, perfect, works like a charm. I assume the size "prefixStyle.Fixed32" refers to a 32bit integer? — Matt, Jan 07 '13 at 14:52
no, it refers to how it is encoded; Fixed32 uses exactly 4 bytes; the other option is a "varint", which is variable length depending on the size of the data. Shaves a few bytes, but tricker to interpret if you aren't used to dealing with them and aren't using the inbuilt functions. — Marc Gravell, Jan 07 '13 at 14:55
@MarcGravell, so for anything smaller than 10,000 bytes Fixed32 should do fine? I guess I am still confused, so the PrefixStyle describes the size of the serialized object or the size of the length prefix which itself indicates the size of the serialized object? — Matt, Jan 07 '13 at 14:59
it **only** affects the length prefix; whether it is 4 bytes vs 1-6; the point is not really about space usage, but about what the user wants. For example, by using varints, you can write a stream of objects individual that is indistinguishable from a sequence of objects written as a single message. — Marc Gravell, Jan 07 '13 at 15:29

score 10 · Accepted Answer · answered Jan 07 '13 at 13:40

10

The most common cause I've seen of this is simply incorrect use of GetBuffer() on a MemoryStream. That was already my hunch when I added my comment, but you've confirmed it:

using (var stream = new MemoryStream())
{
    ProtoBuf.Serializer.Serialize<T>(stream, serializeThis);
    return stream.GetBuffer();
}

GetBuffer returns the oversized backing-buffer. It has garbage at the end of it. It is perfectly fine to use GetBuffer, as long as you also record the .Length, which is the amount of valid data in there. This can avoid an extra allocation of a potentially large array. But in your case, a simpler approach is probably to use ToArray() to get a right-sized buffer:

using (var stream = new MemoryStream())
{
    ProtoBuf.Serializer.Serialize<T>(stream, serializeThis);
    return stream.ToArray();
}

answered Jan 07 '13 at 13:40

Marc Gravell

1,026,079
266
2,566
2,900

Marked as answered. The key for me was the "WithLengthPrefix" issue. I could have sworn this worked without before, my code broke after I updated to the current ProtoBuf-net version. I can assure you that I serialized to a byte array of size 256 even though the actual serialized object was only of length 146 and I deserialized by passing in a 10,000 byte array where only the first 146 bytes were taken up by the serialized object. It worked, I am absolutely sure of that. I have no clue what changed but I know my code was not safe because the serialized object was not prefixed with size. – Matt Jan 07 '13 at 14:55
This solution works if you are working with a fresh archive. Is there a way to fix this issue if we already have archives were written out using GetBuffer() and we no longer know the size of the archive? – Etienne Feb 17 '14 at 21:05
@Etienne You could potentially use the reader API to check for when the next field-header is zero, which is never legal - only works if the `MemoryStream` was not previously used for something else random (i.e. is the unused space all zeros?). Heck, you could probably just remove all the trailing zeros - there's an edge case that there were *legitimate* trailing zeros, but it is easier than hooking the reader API. What sort of quantity are you talking about here? 10? 1000? 1000000? – Marc Gravell Feb 17 '14 at 21:14
@Marc, I ended up removing the trailing zeros and it worked for me. It's probably not the most ideal solution but in my case this will be executes so infrequently that it doesn't really matter. I was hoping that the library had some cool way of dealing with it but I was wrong. – Etienne Feb 25 '14 at 19:32

ProtoBuf-Net error message " Invalid field in source data: 0"

1 Answers1

Linked