Skip data when reading from protobuf-net

Question

I have a file that n instances of an object are serialized and stored in it. In some cases I need to skip stored records and read in k 2k 3k ... n sequence instead of normal 1 2 3 ... n sequence. Since these instances don't have the same length I wrote this code to skip records, but it throws Invalid wire-type exception. (and exception has a link to this question but it didn't help)

Is it just a simple mistake or I am doing this in a wrong way?

long position = stream.Position;
int length = 0;
for (int i = 0; i < skipRate; ++i)
{
    Serializer.TryReadLengthPrefix(stream, PrefixStyle.Fixed32, out length);
    position += length;
    stream.Position = position;
    Console.WriteLine("Skipped " + length + " bytes");
}

MYCLASS retval = Serializer.DeserializeWithLengthPrefix<MYCLASS>(stream, PrefixStyle.Fixed32, 1);

EDIT:

long position = stream.Position;
int length;
for (int i = 0; i < skipRate; ++i)
{
    Serializer.TryReadLengthPrefix(stream, PrefixStyle.Fixed32, out length);
    length += (int)(stream.Position - position); // add number of bytes that TryReadLengthPrefix moves stream

    stream.Position = position; // Rewind
    Serializer.DeserializeWithLengthPrefix<SimulationOutputData>(stream, PrefixStyle.Fixed32, 1);
    Console.WriteLine("Try read returns " + length + ", but deserialize goes on " + (stream.Position-position));
}

Output:

Try read returns 1209, but deserialize goes on 1209
Try read returns 1186, but deserialize goes on 1186
Try read returns 1186, but deserialize goes on 1186
Try read returns 1186, but deserialize goes on 1186
Try read returns 1186, but deserialize goes on 1186
Try read returns 1209, but deserialize goes on 1209
Try read returns 1167, but deserialize goes on 1167
.
.
.

And this code works (WHY?!! what is the difference?):

Serializer.TryReadLengthPrefix(stream, PrefixStyle.Fixed32, out length);
length += (int)(stream.Position - position);
stream.Position = position;
position += length;
stream.Position = position;

I guess you have to adjust stream position after each TryReadLengthPrefix. — athabaska, Jan 30 '14 at 06:54
@athabaska I tried that too, `DeserializeWithLengthPrefix` Function throws the exception. — sharafi, Jan 30 '14 at 07:16
Well, this code is logically wrong - you try to get the length of current object in stream, you dont check if you got anything in our parameter, and worse of all you dont change stream position before reading next length. I guess, TryReadLengthPrefix moves stream position, so your next attempt fails. — athabaska, Jan 30 '14 at 07:29
My actual code did change stream position inside the loop, i dont know why i changed the code before posting it in SO. I edited my post. You are saying that i should check if there is a valid value in `length`? Also there is a bug near end of stream, when `n` is not dividable by `k`, but i wanna get the desired result, and then add error checking and special use case handling codes. — sharafi, Jan 30 '14 at 07:41
If edited code doesnt work, I can only guess that TryReadLengthPrefix also moves stream position, so you got do reduce length by the size of length prefix. Check if stream position changed after TryReadLengthPrefix in debugger. — athabaska, Jan 30 '14 at 07:53

score 2 · Answer 1 · answered Jan 30 '14 at 09:13

2

athabaska (comments) has the gist of it. Your position increments aren't accounting for the headers. A neat implementation might be:

for (int i = 0; i < skipRate; ++i)
{
    int length;
    if(!Serializer.TryReadLengthPrefix(stream, PrefixStyle.Fixed32, out length))
    {
        throw new EndOfStreamException(); // not enough records
    }
    s.Seek(length, SeekOrigin.Current);
}

MYCLASS retval = Serializer.DeserializeWithLengthPrefix<MYCLASS>(
    stream, PrefixStyle.Fixed32, 0);

answered Jan 30 '14 at 09:13

Marc Gravell

1,026,079
266
2,566
2,900

I just tried your version Marc, and it works, BUT it is SLOWER than actually deserializing and ignoring each record. TryReadLengthPrefix must be quite expensive. Is there a way to write FIXED length records - so that we know the size and can just skip a whole tranch? I need to run through say 200 mio. records and would like to speed it up. – ManInMoon Apr 05 '14 at 08:41

Skip data when reading from protobuf-net

1 Answers1